Compare commits

..

4 commits

Author SHA1 Message Date
Don
2949510d68
Merge pull request #1286 from benbusby/updates
Some checks failed
docker_tests / docker (push) Has been cancelled
pypi / Build and publish to TestPyPI (push) Has been cancelled
pypi / Build and publish to PyPI (push) Has been cancelled
tests / test (push) Has been cancelled
Updates, Features, and Bugfixes; Oh My!
2025-12-29 09:50:24 -06:00
Don-Swanson
255f1a2c12
Bump version to 1.2.2 2025-12-29 09:45:26 -06:00
Don-Swanson
4852e5b64f
Implement Google Custom Search (BYOK) feature with configuration options and API client 2025-12-29 09:43:20 -06:00
Don-Swanson
9c5b3150aa
Add method to remove AI Overview sections from search results
- Introduced `remove_ai_overview` method in the `Filter` class to eliminate Google's AI Overview results from the main search results.
- Enhanced the filtering process by identifying and removing divs containing specific AI-related patterns, ensuring cleaner output for users.
2025-12-09 09:43:16 -06:00
11 changed files with 759 additions and 22 deletions

111
README.md
View file

@ -40,8 +40,9 @@ Contents
1. [Arch/AUR](#arch-linux--arch-based-distributions)
1. [Helm/Kubernetes](#helm-chart-for-kubernetes)
4. [Environment Variables and Configuration](#environment-variables)
5. [Usage](#usage)
6. [Extra Steps](#extra-steps)
5. [Google Custom Search (BYOK)](#google-custom-search-byok)
6. [Usage](#usage)
7. [Extra Steps](#extra-steps)
1. [Set Primary Search Engine](#set-whoogle-as-your-primary-search-engine)
2. [Custom Redirecting](#custom-redirecting)
2. [Custom Bangs](#custom-bangs)
@ -50,10 +51,10 @@ Contents
5. [Using with Firefox Containers](#using-with-firefox-containers)
6. [Reverse Proxying](#reverse-proxying)
1. [Nginx](#nginx)
7. [Contributing](#contributing)
8. [FAQ](#faq)
9. [Public Instances](#public-instances)
10. [Screenshots](#screenshots)
8. [Contributing](#contributing)
9. [FAQ](#faq)
10. [Public Instances](#public-instances)
11. [Screenshots](#screenshots)
## Features
- No ads or sponsored content
@ -475,7 +476,6 @@ There are a few optional environment variables available for customizing a Whoog
| WHOOGLE_AUTOCOMPLETE | Controls visibility of autocomplete/search suggestions. Default on -- use '0' to disable. |
| WHOOGLE_MINIMAL | Remove everything except basic result cards from all search queries. |
| WHOOGLE_CSP | Sets a default set of 'Content-Security-Policy' headers |
| WHOOGLE_RESULTS_PER_PAGE | Set the number of results per page |
| WHOOGLE_TOR_SERVICE | Enable/disable the Tor service on startup. Default on -- use '0' to disable. |
| WHOOGLE_TOR_USE_PASS | Use password authentication for tor control port. |
| WHOOGLE_TOR_CONF | The absolute path to the config file containing the password for the tor control port. Default: ./misc/tor/control.conf WHOOGLE_TOR_PASS must be 1 for this to work.|
@ -512,6 +512,103 @@ These environment variables allow setting default config values, but can be over
| WHOOGLE_CONFIG_ANON_VIEW | Include the "anonymous view" option for each search result |
| WHOOGLE_CONFIG_SHOW_USER_AGENT | Display the User Agent string used for search in results footer |
### Google Custom Search (BYOK) Environment Variables
These environment variables configure the "Bring Your Own Key" feature for Google Custom Search API:
| Variable | Description |
| -------------------- | ----------------------------------------------------------------------------------------- |
| WHOOGLE_CSE_API_KEY | Your Google API key with Custom Search API enabled |
| WHOOGLE_CSE_ID | Your Custom Search Engine ID (cx parameter) |
| WHOOGLE_USE_CSE | Enable Custom Search API by default (set to '1' to enable) |
## Google Custom Search (BYOK)
If Google blocks traditional search scraping (captchas, IP bans), you can use your own Google Custom Search Engine credentials as a fallback. This uses Google's official API with your own quota.
### Why Use This?
- **Reliability**: Official API never gets blocked or rate-limited (within quota)
- **Speed**: Direct JSON responses are faster than HTML scraping
- **Fallback**: Works when all scraping workarounds fail
- **Privacy**: Your searches still don't go through third parties—they go directly to Google with your own API key
### Limitations vs Standard Whoogle
| Feature | Standard Scraping | CSE API |
|------------------|--------------------------|---------------------|
| Daily limit | None (until blocked) | 100 free, then paid |
| Image search | ✅ Full support | ✅ Supported |
| News/Videos tabs | ✅ | ❌ Web results only |
| Speed | Slower (HTML parsing) | Faster (JSON) |
| Reliability | Can be blocked | Always works |
### Setup Steps
#### 1. Create a Custom Search Engine
1. Go to [Programmable Search Engine](https://programmablesearchengine.google.com/controlpanel/all)
2. Click **"Add"** to create a new search engine
3. Under "What to search?", select **"Search the entire web"**
4. Give it a name (e.g., "My Whoogle CSE")
5. Click **"Create"**
6. Copy your **Search Engine ID**
#### 2. Get an API Key
1. Go to [Google Cloud Console](https://console.cloud.google.com/)
2. Create a new project or select an existing one
3. Go to **APIs & Services** → **Library**
4. Search for **"Custom Search API"** and click **Enable**
5. Go to **APIs & Services** → **Credentials**
6. Click **"Create Credentials"** → **"API Key"**
7. Copy your API key (looks like `AIza...`)
#### 3. (Recommended) Restrict Your API Key
To prevent misuse if your key is exposed:
1. Click on your API key in Credentials
2. Under **"API restrictions"**, select **"Restrict key"**
3. Choose only **"Custom Search API"**
4. Under **"Application restrictions"**, consider adding IP restrictions if using on a server
5. Click **Save**
#### 4. Configure Whoogle
**Option A: Via Settings UI**
1. Open your Whoogle instance
2. Click the **Config** button
3. Scroll to "Google Custom Search (BYOK)" section
4. Enter your API Key and CSE ID
5. Check "Use Custom Search API"
6. Click **Apply**
**Option B: Via Environment Variables**
```bash
WHOOGLE_CSE_API_KEY=AIza...
WHOOGLE_CSE_ID=23f...
WHOOGLE_USE_CSE=1
```
### Pricing & Avoiding Charges
| Tier | Queries | Cost |
|------|------------------|-----------------------|
| Free | 100/day | $0 |
| Paid | Up to 10,000/day | $5 per 1,000 queries |
**⚠️ To avoid unexpected charges:**
1. **Don't add a payment method** to Google Cloud (safest option—API stops at 100/day)
2. **Set a billing budget alert**: [Billing → Budgets & Alerts](https://console.cloud.google.com/billing/budgets)
3. **Cap API usage**: APIs & Services → Custom Search API → Quotas → Set "Queries per day" to 100
4. **Monitor usage**: APIs & Services → Custom Search API → Metrics
### Troubleshooting
| Error | Cause | Solution |
|---------------------|---------------------------|-----------------------------------------------------------------|
| "API key not valid" | Invalid or restricted key | Check key in Cloud Console, ensure Custom Search API is enabled |
| "Quota exceeded" | Hit 100/day limit | Wait until midnight PT, or enable billing |
| "Invalid CSE ID" | Wrong cx parameter | Copy ID from Programmable Search Engine control panel |
## Usage
Same as most search engines, with the exception of filtering by time range.

View file

@ -160,6 +160,7 @@ class Filter:
self.soup = soup
self.main_divs = self.soup.find('div', {'id': 'main'})
self.remove_ads()
self.remove_ai_overview()
self.remove_block_titles()
self.remove_block_url()
self.collapse_sections()
@ -221,7 +222,7 @@ class Filter:
Returns:
None (The soup object is modified directly)
"""
if not div:
if not div or not isinstance(div, Tag):
return
for d in div.find_all('div', recursive=True):
@ -323,6 +324,48 @@ class Filter:
result.string.replace_with(result.string.replace(
search_string, ''))
def remove_ai_overview(self) -> None:
"""Removes Google's AI Overview/SGE results from search results
Returns:
None (The soup object is modified directly)
"""
if not self.main_divs:
return
# Patterns that identify AI Overview sections
ai_patterns = [
'AI Overview',
'AI responses may include mistakes',
]
# Result div classes - check both original Google classes and mapped ones
# since this runs before CSS class replacement
result_classes = [GClasses.result_class_a] # 'ZINbbc'
result_classes.extend(GClasses.result_classes.get(
GClasses.result_class_a, [])) # ['Gx5Zad']
# Collect divs to remove first to avoid modifying while iterating
divs_to_remove = []
for div in self.main_divs.find_all('div', recursive=True):
# Check if this div or its children contain AI Overview markers
div_text = div.get_text()
if any(pattern in div_text for pattern in ai_patterns):
# Walk up to find the top-level result div
parent = div
while parent:
p_cls = parent.attrs.get('class') or []
if any(rc in p_cls for rc in result_classes):
if parent not in divs_to_remove:
divs_to_remove.append(parent)
break
parent = parent.parent
# Remove collected divs
for div in divs_to_remove:
div.decompose()
def remove_ads(self) -> None:
"""Removes ads found in the list of search result divs
@ -394,6 +437,11 @@ class Filter:
if not self.main_divs:
return
# Skip collapsing for CSE (Custom Search Engine) results
# CSE results have a data-cse attribute on the main container
if self.soup.find(attrs={'data-cse': 'true'}):
return
# Loop through results and check for the number of child divs in each
for result in self.main_divs.find_all():
result_children = pull_child_divs(result)

View file

@ -48,6 +48,8 @@ class Config:
self.show_user_agent = read_config_bool('WHOOGLE_CONFIG_SHOW_USER_AGENT')
# Add user agent related keys to safe_keys
# Note: CSE credentials (cse_api_key, cse_id) are intentionally NOT included
# in safe_keys for security - they should not be shareable via URL
self.safe_keys = [
'lang_search',
'lang_interface',
@ -92,6 +94,11 @@ class Config:
self.preferences_encrypted = read_config_bool('WHOOGLE_CONFIG_PREFERENCES_ENCRYPTED')
self.preferences_key = os.getenv('WHOOGLE_CONFIG_PREFERENCES_KEY', '')
# Google Custom Search Engine (CSE) BYOK settings
self.cse_api_key = os.getenv('WHOOGLE_CSE_API_KEY', '')
self.cse_id = os.getenv('WHOOGLE_CSE_ID', '')
self.use_cse = read_config_bool('WHOOGLE_USE_CSE')
self.accept_language = False
# Skip setting custom config if there isn't one

View file

@ -216,18 +216,11 @@ class Request:
"""
def __init__(self, normal_ua, root_path, config: Config, http_client=None):
results_per_page = str(os.getenv('WHOOGLE_RESULTS_PER_PAGE', 10))
self.search_url = (
'https://www.google.com/search?gbv=1&num='
f'{results_per_page}&q='
)
self.search_url = 'https://www.google.com/search?gbv=1&q='
# Google Images rejects the lightweight gbv=1 interface. Use the
# modern udm=2 entrypoint specifically for image searches to avoid the
# "update your browser" interstitial.
self.image_search_url = (
'https://www.google.com/search?udm=2&num='
f'{results_per_page}&q='
)
self.image_search_url = 'https://www.google.com/search?udm=2&q='
# Optionally send heartbeat to Tor to determine availability
# Only when Tor is enabled in config to avoid unnecessary socket usage
if config.tor:

View file

@ -17,6 +17,7 @@ from app import app
from app.models.config import Config
from app.models.endpoint import Endpoint
from app.request import Request, TorError
from app.services.cse_client import CSEException
from app.utils.bangs import suggest_bang, resolve_bang
from app.utils.misc import empty_gif, placeholder_img, get_proxy_host_url, \
fetch_favicon
@ -356,6 +357,30 @@ def search():
session['config']['tor'] = False if e.disable else session['config'][
'tor']
return redirect(url_for('.index'))
except CSEException as e:
localization_lang = g.user_config.get_localization_lang()
translation = app.config['TRANSLATIONS'][localization_lang]
wants_json = (
request.args.get('format') == 'json' or
'application/json' in request.headers.get('Accept', '') or
'application/*+json' in request.headers.get('Accept', '')
)
error_msg = f"Custom Search API Error: {e.message}"
if e.is_quota_error:
error_msg = ("Google Custom Search API quota exceeded. "
"Free tier allows 100 queries/day. "
"Wait until midnight PT or disable CSE in settings.")
if wants_json:
return jsonify({
'error': True,
'error_message': error_msg,
'query': urlparse.unquote(query)
}), e.code
return render_template(
'error.html',
error_message=error_msg,
translation=translation,
config=g.user_config), e.code
wants_json = (
request.args.get('format') == 'json' or
@ -424,6 +449,16 @@ def search():
search_util.search_type,
g.user_config.preferences,
translation)
# Filter out unsupported tabs when CSE is enabled
# CSE only supports web (all) and image search, not videos/news
use_cse = (
g.user_config.use_cse and
g.user_config.cse_api_key and
g.user_config.cse_id
)
if use_cse:
tabs = {k: v for k, v in tabs.items() if k in ['all', 'images', 'maps']}
# Feature to display currency_card
# Since this is determined by more than just the

452
app/services/cse_client.py Normal file
View file

@ -0,0 +1,452 @@
"""Google Custom Search Engine (CSE) API Client
This module provides a client for Google's Custom Search JSON API,
allowing users to bring their own API key (BYOK) for search functionality.
"""
import httpx
from typing import Optional
from dataclasses import dataclass
from urllib.parse import urlparse
from flask import render_template
# Google Custom Search API endpoint
CSE_API_URL = 'https://www.googleapis.com/customsearch/v1'
class CSEException(Exception):
"""Exception raised for CSE API errors"""
def __init__(self, message: str, code: int = 500, is_quota_error: bool = False):
self.message = message
self.code = code
self.is_quota_error = is_quota_error
super().__init__(self.message)
@dataclass
class CSEError:
"""Represents an error from the CSE API"""
code: int
message: str
@property
def is_quota_exceeded(self) -> bool:
return self.code == 429 or 'quota' in self.message.lower()
@property
def is_invalid_key(self) -> bool:
return self.code == 400 or 'invalid' in self.message.lower()
@dataclass
class CSEResult:
"""Represents a single search result from CSE API"""
title: str
link: str
snippet: str
display_link: str
html_title: Optional[str] = None
html_snippet: Optional[str] = None
# Image-specific fields (populated for image search)
image_url: Optional[str] = None
thumbnail_url: Optional[str] = None
image_width: Optional[int] = None
image_height: Optional[int] = None
context_link: Optional[str] = None # Page where image was found
@dataclass
class CSEResponse:
"""Represents a complete CSE API response"""
results: list[CSEResult]
total_results: str
search_time: float
query: str
start_index: int
is_image_search: bool = False
error: Optional[CSEError] = None
@property
def has_error(self) -> bool:
return self.error is not None
@property
def has_results(self) -> bool:
return len(self.results) > 0
class CSEClient:
"""Client for Google Custom Search Engine API
Usage:
client = CSEClient(api_key='your-key', cse_id='your-cse-id')
response = client.search('python programming')
if response.has_error:
print(f"Error: {response.error.message}")
else:
for result in response.results:
print(f"{result.title}: {result.link}")
"""
def __init__(self, api_key: str, cse_id: str, timeout: float = 10.0):
"""Initialize CSE client
Args:
api_key: Google API key with Custom Search API enabled
cse_id: Custom Search Engine ID (cx parameter)
timeout: Request timeout in seconds
"""
self.api_key = api_key
self.cse_id = cse_id
self.timeout = timeout
self._client = httpx.Client(timeout=timeout)
def search(
self,
query: str,
start: int = 1,
num: int = 10,
safe: str = 'off',
language: str = '',
country: str = '',
search_type: str = ''
) -> CSEResponse:
"""Execute a search query against the CSE API
Args:
query: Search query string
start: Starting result index (1-based, for pagination)
num: Number of results to return (max 10)
safe: Safe search setting ('off', 'medium', 'high')
language: Language restriction (e.g., 'lang_en')
country: Country restriction (e.g., 'countryUS')
search_type: Type of search ('image' for image search, '' for web)
Returns:
CSEResponse with results or error information
"""
params = {
'key': self.api_key,
'cx': self.cse_id,
'q': query,
'start': start,
'num': min(num, 10), # API max is 10
'safe': safe,
}
# Add search type for image search
if search_type == 'image':
params['searchType'] = 'image'
# Add optional parameters
if language:
# CSE uses 'lr' for language restrict
params['lr'] = language
if country:
# CSE uses 'cr' for country restrict
params['cr'] = country
try:
response = self._client.get(CSE_API_URL, params=params)
data = response.json()
# Check for API errors
if 'error' in data:
error_info = data['error']
return CSEResponse(
results=[],
total_results='0',
search_time=0.0,
query=query,
start_index=start,
error=CSEError(
code=error_info.get('code', 500),
message=error_info.get('message', 'Unknown error')
)
)
# Parse successful response
search_info = data.get('searchInformation', {})
items = data.get('items', [])
is_image = search_type == 'image'
results = []
for item in items:
# Extract image-specific data if present
image_data = item.get('image', {})
results.append(CSEResult(
title=item.get('title', ''),
link=item.get('link', ''),
snippet=item.get('snippet', ''),
display_link=item.get('displayLink', ''),
html_title=item.get('htmlTitle'),
html_snippet=item.get('htmlSnippet'),
# Image fields
image_url=item.get('link') if is_image else None,
thumbnail_url=image_data.get('thumbnailLink'),
image_width=image_data.get('width'),
image_height=image_data.get('height'),
context_link=image_data.get('contextLink')
))
return CSEResponse(
results=results,
total_results=search_info.get('totalResults', '0'),
search_time=float(search_info.get('searchTime', 0)),
query=query,
start_index=start,
is_image_search=is_image
)
except httpx.TimeoutException:
return CSEResponse(
results=[],
total_results='0',
search_time=0.0,
query=query,
start_index=start,
error=CSEError(code=408, message='Request timed out')
)
except httpx.RequestError as e:
return CSEResponse(
results=[],
total_results='0',
search_time=0.0,
query=query,
start_index=start,
error=CSEError(code=500, message=f'Request failed: {str(e)}')
)
except Exception as e:
return CSEResponse(
results=[],
total_results='0',
search_time=0.0,
query=query,
start_index=start,
error=CSEError(code=500, message=f'Unexpected error: {str(e)}')
)
def close(self):
"""Close the HTTP client"""
self._client.close()
def __enter__(self):
return self
def __exit__(self, *args):
self.close()
def cse_results_to_html(response: CSEResponse, query: str) -> str:
"""Convert CSE API response to HTML matching Whoogle's result format
This generates HTML that mimics the structure expected by Whoogle's
existing filter and result processing pipeline.
Args:
response: CSEResponse from the API
query: Original search query
Returns:
HTML string formatted like Google search results
"""
if response.has_error:
error = response.error
if error.is_quota_exceeded:
return _error_html(
'API Quota Exceeded',
'Your Google Custom Search API quota has been exceeded. '
'Free tier allows 100 queries/day. Wait until midnight PT '
'or enable billing in Google Cloud Console.'
)
elif error.is_invalid_key:
return _error_html(
'Invalid API Key',
'Your Google Custom Search API key is invalid. '
'Please check your API key and CSE ID in settings.'
)
else:
return _error_html('Search Error', error.message)
if not response.has_results:
return _no_results_html(query)
# Use different HTML structure for image vs web results
if response.is_image_search:
return _image_results_html(response, query)
# Build HTML results matching Whoogle's expected structure
results_html = []
for result in response.results:
# Escape HTML in content
title = _escape_html(result.title)
snippet = _escape_html(result.snippet)
link = result.link
display_link = _escape_html(result.display_link)
# Use HTML versions if available (they have bold tags for query terms)
if result.html_title:
title = result.html_title
if result.html_snippet:
snippet = result.html_snippet
# Match the structure used by Google/mock results
result_html = f'''
<div class="ZINbbc xpd O9g5cc uUPGi">
<div class="kCrYT">
<a href="{link}">
<h3 class="BNeawe vvjwJb AP7Wnd">{title}</h3>
<div class="BNeawe UPmit AP7Wnd luh4tb" style="color: var(--whoogle-result-url);">{display_link}</div>
</a>
</div>
<div class="kCrYT">
<div class="BNeawe s3v9rd AP7Wnd">
<span class="VwiC3b">{snippet}</span>
</div>
</div>
</div>
'''
results_html.append(result_html)
# Build pagination if needed
pagination_html = ''
if int(response.total_results) > 10:
pagination_html = _pagination_html(response.start_index, response.query)
# Wrap in expected structure
# Add data-cse attribute to prevent collapse_sections from collapsing these results
return f'''
<html>
<body>
<div id="main" data-cse="true">
<div id="cnt">
<div id="rcnt">
<div id="center_col">
<div id="res">
<div id="search">
<div id="rso">
{''.join(results_html)}
</div>
</div>
</div>
{pagination_html}
</div>
</div>
</div>
</div>
</body>
</html>
'''
def _escape_html(text: str) -> str:
"""Escape HTML special characters"""
if not text:
return ''
return (text
.replace('&', '&amp;')
.replace('<', '&lt;')
.replace('>', '&gt;')
.replace('"', '&quot;')
.replace("'", '&#39;'))
def _error_html(title: str, message: str) -> str:
"""Generate error HTML"""
return f'''
<html>
<body>
<div id="main">
<div style="padding: 20px; text-align: center;">
<h2 style="color: #d93025;">{_escape_html(title)}</h2>
<p>{_escape_html(message)}</p>
</div>
</div>
</body>
</html>
'''
def _no_results_html(query: str) -> str:
"""Generate no results HTML"""
return f'''
<html>
<body>
<div id="main">
<div style="padding: 20px;">
<p>No results found for <b>{_escape_html(query)}</b></p>
</div>
</div>
</body>
</html>
'''
def _image_results_html(response: CSEResponse, query: str) -> str:
"""Generate HTML for image search results using the imageresults template
Args:
response: CSEResponse with image results
query: Original search query
Returns:
HTML string formatted for image results display
"""
# Convert CSE results to the format expected by imageresults.html template
results = []
for result in response.results:
image_url = result.image_url or result.link
thumbnail_url = result.thumbnail_url or image_url
web_page = result.context_link or result.link
domain = urlparse(web_page).netloc if web_page else result.display_link
results.append({
'domain': domain,
'img_url': image_url,
'web_page': web_page,
'img_tbn': thumbnail_url
})
# Build pagination link if needed
next_link = None
if int(response.total_results) > response.start_index + len(response.results) - 1:
next_start = response.start_index + 10
next_link = f'search?q={query}&tbm=isch&start={next_start}'
# Use the same template as regular image results
return render_template(
'imageresults.html',
length=len(results),
results=results,
view_label="View Image",
next_link=next_link
)
def _pagination_html(current_start: int, query: str) -> str:
"""Generate pagination links"""
# CSE API uses 1-based indexing, 10 results per page
current_page = (current_start - 1) // 10 + 1
prev_link = ''
next_link = ''
if current_page > 1:
prev_start = (current_page - 2) * 10 + 1
prev_link = f'<a href="search?q={query}&start={prev_start}">Previous</a>'
next_start = current_page * 10 + 1
next_link = f'<a href="search?q={query}&start={next_start}">Next</a>'
return f'''
<div id="foot" style="text-align: center; padding: 20px;">
{prev_link}
<span style="margin: 0 20px;">Page {current_page}</span>
{next_link}
</div>
'''

View file

@ -257,6 +257,30 @@
<input type="checkbox" name="show_user_agent"
id="config-show-user-agent" {{ 'checked' if config.show_user_agent else '' }}>
</div>
<!-- Google Custom Search Engine (BYOK) Settings -->
<div class="config-div config-div-cse-header" style="margin-top: 20px; border-top: 1px solid var(--result-bg); padding-top: 15px;">
<strong>Google Custom Search (BYOK)</strong>
<div><span class="info-text"><a href="https://github.com/benbusby/whoogle-search#google-custom-search-byok">Setup Guide</a></span></div>
</div>
<div class="config-div config-div-use-cse">
<label for="config-use-cse">Use Custom Search API: </label>
<input type="checkbox" name="use_cse" id="config-use-cse" {{ 'checked' if config.use_cse else '' }}>
<div><span class="info-text"> — Enable to use your own Google API key (100 free queries/day)</span></div>
</div>
<div class="config-div config-div-cse-api-key">
<label for="config-cse-api-key">CSE API Key: </label>
<input type="password" name="cse_api_key" id="config-cse-api-key"
value="{{ config.cse_api_key }}"
placeholder="AIza..."
autocomplete="off">
</div>
<div class="config-div config-div-cse-id">
<label for="config-cse-id">CSE ID: </label>
<input type="text" name="cse_id" id="config-cse-id"
value="{{ config.cse_id }}"
placeholder="abc123..."
autocomplete="off">
</div>
<div class="config-div config-div-root-url">
<label for="config-url">{{ translation['config-url'] }}: </label>
<input type="text" name="url" id="config-url" value="{{ config.url }}">

View file

@ -5,6 +5,7 @@ from app.filter import Filter
from app.request import gen_query
from app.utils.misc import get_proxy_host_url
from app.utils.results import get_first_link
from app.services.cse_client import CSEClient, cse_results_to_html
from bs4 import BeautifulSoup as bsoup
from cryptography.fernet import Fernet, InvalidToken
from flask import g
@ -142,6 +143,89 @@ class Search:
config=self.config,
query=self.query,
page_url=self.request.url)
# Check if CSE (Custom Search Engine) should be used
use_cse = (
self.config.use_cse and
self.config.cse_api_key and
self.config.cse_id
)
if use_cse:
# Use Google Custom Search API
return self._generate_cse_response(content_filter, root_url, mobile)
# Default: Use traditional scraping method
return self._generate_scrape_response(content_filter, root_url, mobile)
def _generate_cse_response(self, content_filter: Filter, root_url: str, mobile: bool) -> str:
"""Generate response using Google Custom Search API
Args:
content_filter: Filter instance for processing results
root_url: Root URL of the instance
mobile: Whether this is a mobile request
Returns:
str: HTML response string
"""
# Get pagination start index from request params
start = int(self.request_params.get('start', 1))
# Determine safe search setting
safe = 'high' if self.config.safe else 'off'
# Determine search type (web or image)
# tbm=isch or udm=2 indicates image search
search_type = ''
if self.search_type == 'isch' or self.request_params.get('udm') == '2':
search_type = 'image'
# Create CSE client and perform search
with CSEClient(
api_key=self.config.cse_api_key,
cse_id=self.config.cse_id
) as client:
response = client.search(
query=self.query,
start=start,
safe=safe,
language=self.config.lang_search,
country=self.config.country,
search_type=search_type
)
# Convert CSE response to HTML
html_content = cse_results_to_html(response, self.query)
# Store full query for tabs
self.full_query = self.query
# Parse and filter the HTML
html_soup = bsoup(html_content, 'html.parser')
# Handle feeling lucky
if self.feeling_lucky:
if response.has_results and response.results:
return response.results[0].link
self.feeling_lucky = False
# Apply content filter (encrypts links, applies CSS, etc.)
formatted_results = content_filter.clean(html_soup)
return str(formatted_results)
def _generate_scrape_response(self, content_filter: Filter, root_url: str, mobile: bool) -> str:
"""Generate response using traditional HTML scraping
Args:
content_filter: Filter instance for processing results
root_url: Root URL of the instance
mobile: Whether this is a mobile request
Returns:
str: HTML response string
"""
full_query = gen_query(self.query,
self.request_params,
self.config)

View file

@ -4,5 +4,5 @@ optional_dev_tag = ''
if os.getenv('DEV_BUILD'):
optional_dev_tag = '.dev' + os.getenv('DEV_BUILD')
__version__ = '1.2.1' + optional_dev_tag
__version__ = '1.2.2' + optional_dev_tag

View file

@ -30,5 +30,5 @@ h11>=0.16.0
validators==0.35.0
waitress==3.0.2
wcwidth==0.2.14
Werkzeug==3.1.3
Werkzeug==3.1.4
python-dotenv==1.1.1

View file

@ -72,9 +72,6 @@
# Remove everything except basic result cards from all search queries
#WHOOGLE_MINIMAL=0
# Set the number of results per page
#WHOOGLE_RESULTS_PER_PAGE=10
# Controls visibility of autocomplete/search suggestions
#WHOOGLE_AUTOCOMPLETE=1