mirror of
https://github.com/benbusby/whoogle-search.git
synced 2026-03-11 08:54:34 +00:00
Compare commits
4 commits
v1.2.1-tes
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
2949510d68 | ||
|
|
255f1a2c12 | ||
|
|
4852e5b64f | ||
|
|
9c5b3150aa |
11 changed files with 759 additions and 22 deletions
111
README.md
111
README.md
|
|
@ -40,8 +40,9 @@ Contents
|
|||
1. [Arch/AUR](#arch-linux--arch-based-distributions)
|
||||
1. [Helm/Kubernetes](#helm-chart-for-kubernetes)
|
||||
4. [Environment Variables and Configuration](#environment-variables)
|
||||
5. [Usage](#usage)
|
||||
6. [Extra Steps](#extra-steps)
|
||||
5. [Google Custom Search (BYOK)](#google-custom-search-byok)
|
||||
6. [Usage](#usage)
|
||||
7. [Extra Steps](#extra-steps)
|
||||
1. [Set Primary Search Engine](#set-whoogle-as-your-primary-search-engine)
|
||||
2. [Custom Redirecting](#custom-redirecting)
|
||||
2. [Custom Bangs](#custom-bangs)
|
||||
|
|
@ -50,10 +51,10 @@ Contents
|
|||
5. [Using with Firefox Containers](#using-with-firefox-containers)
|
||||
6. [Reverse Proxying](#reverse-proxying)
|
||||
1. [Nginx](#nginx)
|
||||
7. [Contributing](#contributing)
|
||||
8. [FAQ](#faq)
|
||||
9. [Public Instances](#public-instances)
|
||||
10. [Screenshots](#screenshots)
|
||||
8. [Contributing](#contributing)
|
||||
9. [FAQ](#faq)
|
||||
10. [Public Instances](#public-instances)
|
||||
11. [Screenshots](#screenshots)
|
||||
|
||||
## Features
|
||||
- No ads or sponsored content
|
||||
|
|
@ -475,7 +476,6 @@ There are a few optional environment variables available for customizing a Whoog
|
|||
| WHOOGLE_AUTOCOMPLETE | Controls visibility of autocomplete/search suggestions. Default on -- use '0' to disable. |
|
||||
| WHOOGLE_MINIMAL | Remove everything except basic result cards from all search queries. |
|
||||
| WHOOGLE_CSP | Sets a default set of 'Content-Security-Policy' headers |
|
||||
| WHOOGLE_RESULTS_PER_PAGE | Set the number of results per page |
|
||||
| WHOOGLE_TOR_SERVICE | Enable/disable the Tor service on startup. Default on -- use '0' to disable. |
|
||||
| WHOOGLE_TOR_USE_PASS | Use password authentication for tor control port. |
|
||||
| WHOOGLE_TOR_CONF | The absolute path to the config file containing the password for the tor control port. Default: ./misc/tor/control.conf WHOOGLE_TOR_PASS must be 1 for this to work.|
|
||||
|
|
@ -512,6 +512,103 @@ These environment variables allow setting default config values, but can be over
|
|||
| WHOOGLE_CONFIG_ANON_VIEW | Include the "anonymous view" option for each search result |
|
||||
| WHOOGLE_CONFIG_SHOW_USER_AGENT | Display the User Agent string used for search in results footer |
|
||||
|
||||
### Google Custom Search (BYOK) Environment Variables
|
||||
|
||||
These environment variables configure the "Bring Your Own Key" feature for Google Custom Search API:
|
||||
|
||||
| Variable | Description |
|
||||
| -------------------- | ----------------------------------------------------------------------------------------- |
|
||||
| WHOOGLE_CSE_API_KEY | Your Google API key with Custom Search API enabled |
|
||||
| WHOOGLE_CSE_ID | Your Custom Search Engine ID (cx parameter) |
|
||||
| WHOOGLE_USE_CSE | Enable Custom Search API by default (set to '1' to enable) |
|
||||
|
||||
## Google Custom Search (BYOK)
|
||||
|
||||
If Google blocks traditional search scraping (captchas, IP bans), you can use your own Google Custom Search Engine credentials as a fallback. This uses Google's official API with your own quota.
|
||||
|
||||
### Why Use This?
|
||||
|
||||
- **Reliability**: Official API never gets blocked or rate-limited (within quota)
|
||||
- **Speed**: Direct JSON responses are faster than HTML scraping
|
||||
- **Fallback**: Works when all scraping workarounds fail
|
||||
- **Privacy**: Your searches still don't go through third parties—they go directly to Google with your own API key
|
||||
|
||||
### Limitations vs Standard Whoogle
|
||||
|
||||
| Feature | Standard Scraping | CSE API |
|
||||
|------------------|--------------------------|---------------------|
|
||||
| Daily limit | None (until blocked) | 100 free, then paid |
|
||||
| Image search | ✅ Full support | ✅ Supported |
|
||||
| News/Videos tabs | ✅ | ❌ Web results only |
|
||||
| Speed | Slower (HTML parsing) | Faster (JSON) |
|
||||
| Reliability | Can be blocked | Always works |
|
||||
|
||||
### Setup Steps
|
||||
|
||||
#### 1. Create a Custom Search Engine
|
||||
1. Go to [Programmable Search Engine](https://programmablesearchengine.google.com/controlpanel/all)
|
||||
2. Click **"Add"** to create a new search engine
|
||||
3. Under "What to search?", select **"Search the entire web"**
|
||||
4. Give it a name (e.g., "My Whoogle CSE")
|
||||
5. Click **"Create"**
|
||||
6. Copy your **Search Engine ID**
|
||||
|
||||
#### 2. Get an API Key
|
||||
1. Go to [Google Cloud Console](https://console.cloud.google.com/)
|
||||
2. Create a new project or select an existing one
|
||||
3. Go to **APIs & Services** → **Library**
|
||||
4. Search for **"Custom Search API"** and click **Enable**
|
||||
5. Go to **APIs & Services** → **Credentials**
|
||||
6. Click **"Create Credentials"** → **"API Key"**
|
||||
7. Copy your API key (looks like `AIza...`)
|
||||
|
||||
#### 3. (Recommended) Restrict Your API Key
|
||||
To prevent misuse if your key is exposed:
|
||||
1. Click on your API key in Credentials
|
||||
2. Under **"API restrictions"**, select **"Restrict key"**
|
||||
3. Choose only **"Custom Search API"**
|
||||
4. Under **"Application restrictions"**, consider adding IP restrictions if using on a server
|
||||
5. Click **Save**
|
||||
|
||||
#### 4. Configure Whoogle
|
||||
|
||||
**Option A: Via Settings UI**
|
||||
1. Open your Whoogle instance
|
||||
2. Click the **Config** button
|
||||
3. Scroll to "Google Custom Search (BYOK)" section
|
||||
4. Enter your API Key and CSE ID
|
||||
5. Check "Use Custom Search API"
|
||||
6. Click **Apply**
|
||||
|
||||
**Option B: Via Environment Variables**
|
||||
```bash
|
||||
WHOOGLE_CSE_API_KEY=AIza...
|
||||
WHOOGLE_CSE_ID=23f...
|
||||
WHOOGLE_USE_CSE=1
|
||||
```
|
||||
|
||||
### Pricing & Avoiding Charges
|
||||
|
||||
| Tier | Queries | Cost |
|
||||
|------|------------------|-----------------------|
|
||||
| Free | 100/day | $0 |
|
||||
| Paid | Up to 10,000/day | $5 per 1,000 queries |
|
||||
|
||||
**⚠️ To avoid unexpected charges:**
|
||||
|
||||
1. **Don't add a payment method** to Google Cloud (safest option—API stops at 100/day)
|
||||
2. **Set a billing budget alert**: [Billing → Budgets & Alerts](https://console.cloud.google.com/billing/budgets)
|
||||
3. **Cap API usage**: APIs & Services → Custom Search API → Quotas → Set "Queries per day" to 100
|
||||
4. **Monitor usage**: APIs & Services → Custom Search API → Metrics
|
||||
|
||||
### Troubleshooting
|
||||
|
||||
| Error | Cause | Solution |
|
||||
|---------------------|---------------------------|-----------------------------------------------------------------|
|
||||
| "API key not valid" | Invalid or restricted key | Check key in Cloud Console, ensure Custom Search API is enabled |
|
||||
| "Quota exceeded" | Hit 100/day limit | Wait until midnight PT, or enable billing |
|
||||
| "Invalid CSE ID" | Wrong cx parameter | Copy ID from Programmable Search Engine control panel |
|
||||
|
||||
## Usage
|
||||
Same as most search engines, with the exception of filtering by time range.
|
||||
|
||||
|
|
|
|||
|
|
@ -160,6 +160,7 @@ class Filter:
|
|||
self.soup = soup
|
||||
self.main_divs = self.soup.find('div', {'id': 'main'})
|
||||
self.remove_ads()
|
||||
self.remove_ai_overview()
|
||||
self.remove_block_titles()
|
||||
self.remove_block_url()
|
||||
self.collapse_sections()
|
||||
|
|
@ -221,7 +222,7 @@ class Filter:
|
|||
Returns:
|
||||
None (The soup object is modified directly)
|
||||
"""
|
||||
if not div:
|
||||
if not div or not isinstance(div, Tag):
|
||||
return
|
||||
|
||||
for d in div.find_all('div', recursive=True):
|
||||
|
|
@ -323,6 +324,48 @@ class Filter:
|
|||
result.string.replace_with(result.string.replace(
|
||||
search_string, ''))
|
||||
|
||||
def remove_ai_overview(self) -> None:
|
||||
"""Removes Google's AI Overview/SGE results from search results
|
||||
|
||||
Returns:
|
||||
None (The soup object is modified directly)
|
||||
"""
|
||||
if not self.main_divs:
|
||||
return
|
||||
|
||||
# Patterns that identify AI Overview sections
|
||||
ai_patterns = [
|
||||
'AI Overview',
|
||||
'AI responses may include mistakes',
|
||||
]
|
||||
|
||||
# Result div classes - check both original Google classes and mapped ones
|
||||
# since this runs before CSS class replacement
|
||||
result_classes = [GClasses.result_class_a] # 'ZINbbc'
|
||||
result_classes.extend(GClasses.result_classes.get(
|
||||
GClasses.result_class_a, [])) # ['Gx5Zad']
|
||||
|
||||
# Collect divs to remove first to avoid modifying while iterating
|
||||
divs_to_remove = []
|
||||
|
||||
for div in self.main_divs.find_all('div', recursive=True):
|
||||
# Check if this div or its children contain AI Overview markers
|
||||
div_text = div.get_text()
|
||||
if any(pattern in div_text for pattern in ai_patterns):
|
||||
# Walk up to find the top-level result div
|
||||
parent = div
|
||||
while parent:
|
||||
p_cls = parent.attrs.get('class') or []
|
||||
if any(rc in p_cls for rc in result_classes):
|
||||
if parent not in divs_to_remove:
|
||||
divs_to_remove.append(parent)
|
||||
break
|
||||
parent = parent.parent
|
||||
|
||||
# Remove collected divs
|
||||
for div in divs_to_remove:
|
||||
div.decompose()
|
||||
|
||||
def remove_ads(self) -> None:
|
||||
"""Removes ads found in the list of search result divs
|
||||
|
||||
|
|
@ -394,6 +437,11 @@ class Filter:
|
|||
if not self.main_divs:
|
||||
return
|
||||
|
||||
# Skip collapsing for CSE (Custom Search Engine) results
|
||||
# CSE results have a data-cse attribute on the main container
|
||||
if self.soup.find(attrs={'data-cse': 'true'}):
|
||||
return
|
||||
|
||||
# Loop through results and check for the number of child divs in each
|
||||
for result in self.main_divs.find_all():
|
||||
result_children = pull_child_divs(result)
|
||||
|
|
|
|||
|
|
@ -48,6 +48,8 @@ class Config:
|
|||
self.show_user_agent = read_config_bool('WHOOGLE_CONFIG_SHOW_USER_AGENT')
|
||||
|
||||
# Add user agent related keys to safe_keys
|
||||
# Note: CSE credentials (cse_api_key, cse_id) are intentionally NOT included
|
||||
# in safe_keys for security - they should not be shareable via URL
|
||||
self.safe_keys = [
|
||||
'lang_search',
|
||||
'lang_interface',
|
||||
|
|
@ -92,6 +94,11 @@ class Config:
|
|||
self.preferences_encrypted = read_config_bool('WHOOGLE_CONFIG_PREFERENCES_ENCRYPTED')
|
||||
self.preferences_key = os.getenv('WHOOGLE_CONFIG_PREFERENCES_KEY', '')
|
||||
|
||||
# Google Custom Search Engine (CSE) BYOK settings
|
||||
self.cse_api_key = os.getenv('WHOOGLE_CSE_API_KEY', '')
|
||||
self.cse_id = os.getenv('WHOOGLE_CSE_ID', '')
|
||||
self.use_cse = read_config_bool('WHOOGLE_USE_CSE')
|
||||
|
||||
self.accept_language = False
|
||||
|
||||
# Skip setting custom config if there isn't one
|
||||
|
|
|
|||
|
|
@ -216,18 +216,11 @@ class Request:
|
|||
"""
|
||||
|
||||
def __init__(self, normal_ua, root_path, config: Config, http_client=None):
|
||||
results_per_page = str(os.getenv('WHOOGLE_RESULTS_PER_PAGE', 10))
|
||||
self.search_url = (
|
||||
'https://www.google.com/search?gbv=1&num='
|
||||
f'{results_per_page}&q='
|
||||
)
|
||||
self.search_url = 'https://www.google.com/search?gbv=1&q='
|
||||
# Google Images rejects the lightweight gbv=1 interface. Use the
|
||||
# modern udm=2 entrypoint specifically for image searches to avoid the
|
||||
# "update your browser" interstitial.
|
||||
self.image_search_url = (
|
||||
'https://www.google.com/search?udm=2&num='
|
||||
f'{results_per_page}&q='
|
||||
)
|
||||
self.image_search_url = 'https://www.google.com/search?udm=2&q='
|
||||
# Optionally send heartbeat to Tor to determine availability
|
||||
# Only when Tor is enabled in config to avoid unnecessary socket usage
|
||||
if config.tor:
|
||||
|
|
|
|||
|
|
@ -17,6 +17,7 @@ from app import app
|
|||
from app.models.config import Config
|
||||
from app.models.endpoint import Endpoint
|
||||
from app.request import Request, TorError
|
||||
from app.services.cse_client import CSEException
|
||||
from app.utils.bangs import suggest_bang, resolve_bang
|
||||
from app.utils.misc import empty_gif, placeholder_img, get_proxy_host_url, \
|
||||
fetch_favicon
|
||||
|
|
@ -356,6 +357,30 @@ def search():
|
|||
session['config']['tor'] = False if e.disable else session['config'][
|
||||
'tor']
|
||||
return redirect(url_for('.index'))
|
||||
except CSEException as e:
|
||||
localization_lang = g.user_config.get_localization_lang()
|
||||
translation = app.config['TRANSLATIONS'][localization_lang]
|
||||
wants_json = (
|
||||
request.args.get('format') == 'json' or
|
||||
'application/json' in request.headers.get('Accept', '') or
|
||||
'application/*+json' in request.headers.get('Accept', '')
|
||||
)
|
||||
error_msg = f"Custom Search API Error: {e.message}"
|
||||
if e.is_quota_error:
|
||||
error_msg = ("Google Custom Search API quota exceeded. "
|
||||
"Free tier allows 100 queries/day. "
|
||||
"Wait until midnight PT or disable CSE in settings.")
|
||||
if wants_json:
|
||||
return jsonify({
|
||||
'error': True,
|
||||
'error_message': error_msg,
|
||||
'query': urlparse.unquote(query)
|
||||
}), e.code
|
||||
return render_template(
|
||||
'error.html',
|
||||
error_message=error_msg,
|
||||
translation=translation,
|
||||
config=g.user_config), e.code
|
||||
|
||||
wants_json = (
|
||||
request.args.get('format') == 'json' or
|
||||
|
|
@ -424,6 +449,16 @@ def search():
|
|||
search_util.search_type,
|
||||
g.user_config.preferences,
|
||||
translation)
|
||||
|
||||
# Filter out unsupported tabs when CSE is enabled
|
||||
# CSE only supports web (all) and image search, not videos/news
|
||||
use_cse = (
|
||||
g.user_config.use_cse and
|
||||
g.user_config.cse_api_key and
|
||||
g.user_config.cse_id
|
||||
)
|
||||
if use_cse:
|
||||
tabs = {k: v for k, v in tabs.items() if k in ['all', 'images', 'maps']}
|
||||
|
||||
# Feature to display currency_card
|
||||
# Since this is determined by more than just the
|
||||
|
|
|
|||
452
app/services/cse_client.py
Normal file
452
app/services/cse_client.py
Normal file
|
|
@ -0,0 +1,452 @@
|
|||
"""Google Custom Search Engine (CSE) API Client
|
||||
|
||||
This module provides a client for Google's Custom Search JSON API,
|
||||
allowing users to bring their own API key (BYOK) for search functionality.
|
||||
"""
|
||||
|
||||
import httpx
|
||||
from typing import Optional
|
||||
from dataclasses import dataclass
|
||||
from urllib.parse import urlparse
|
||||
|
||||
from flask import render_template
|
||||
|
||||
|
||||
# Google Custom Search API endpoint
|
||||
CSE_API_URL = 'https://www.googleapis.com/customsearch/v1'
|
||||
|
||||
|
||||
class CSEException(Exception):
|
||||
"""Exception raised for CSE API errors"""
|
||||
def __init__(self, message: str, code: int = 500, is_quota_error: bool = False):
|
||||
self.message = message
|
||||
self.code = code
|
||||
self.is_quota_error = is_quota_error
|
||||
super().__init__(self.message)
|
||||
|
||||
|
||||
@dataclass
|
||||
class CSEError:
|
||||
"""Represents an error from the CSE API"""
|
||||
code: int
|
||||
message: str
|
||||
|
||||
@property
|
||||
def is_quota_exceeded(self) -> bool:
|
||||
return self.code == 429 or 'quota' in self.message.lower()
|
||||
|
||||
@property
|
||||
def is_invalid_key(self) -> bool:
|
||||
return self.code == 400 or 'invalid' in self.message.lower()
|
||||
|
||||
|
||||
@dataclass
|
||||
class CSEResult:
|
||||
"""Represents a single search result from CSE API"""
|
||||
title: str
|
||||
link: str
|
||||
snippet: str
|
||||
display_link: str
|
||||
html_title: Optional[str] = None
|
||||
html_snippet: Optional[str] = None
|
||||
# Image-specific fields (populated for image search)
|
||||
image_url: Optional[str] = None
|
||||
thumbnail_url: Optional[str] = None
|
||||
image_width: Optional[int] = None
|
||||
image_height: Optional[int] = None
|
||||
context_link: Optional[str] = None # Page where image was found
|
||||
|
||||
|
||||
@dataclass
|
||||
class CSEResponse:
|
||||
"""Represents a complete CSE API response"""
|
||||
results: list[CSEResult]
|
||||
total_results: str
|
||||
search_time: float
|
||||
query: str
|
||||
start_index: int
|
||||
is_image_search: bool = False
|
||||
error: Optional[CSEError] = None
|
||||
|
||||
@property
|
||||
def has_error(self) -> bool:
|
||||
return self.error is not None
|
||||
|
||||
@property
|
||||
def has_results(self) -> bool:
|
||||
return len(self.results) > 0
|
||||
|
||||
|
||||
class CSEClient:
|
||||
"""Client for Google Custom Search Engine API
|
||||
|
||||
Usage:
|
||||
client = CSEClient(api_key='your-key', cse_id='your-cse-id')
|
||||
response = client.search('python programming')
|
||||
|
||||
if response.has_error:
|
||||
print(f"Error: {response.error.message}")
|
||||
else:
|
||||
for result in response.results:
|
||||
print(f"{result.title}: {result.link}")
|
||||
"""
|
||||
|
||||
def __init__(self, api_key: str, cse_id: str, timeout: float = 10.0):
|
||||
"""Initialize CSE client
|
||||
|
||||
Args:
|
||||
api_key: Google API key with Custom Search API enabled
|
||||
cse_id: Custom Search Engine ID (cx parameter)
|
||||
timeout: Request timeout in seconds
|
||||
"""
|
||||
self.api_key = api_key
|
||||
self.cse_id = cse_id
|
||||
self.timeout = timeout
|
||||
self._client = httpx.Client(timeout=timeout)
|
||||
|
||||
def search(
|
||||
self,
|
||||
query: str,
|
||||
start: int = 1,
|
||||
num: int = 10,
|
||||
safe: str = 'off',
|
||||
language: str = '',
|
||||
country: str = '',
|
||||
search_type: str = ''
|
||||
) -> CSEResponse:
|
||||
"""Execute a search query against the CSE API
|
||||
|
||||
Args:
|
||||
query: Search query string
|
||||
start: Starting result index (1-based, for pagination)
|
||||
num: Number of results to return (max 10)
|
||||
safe: Safe search setting ('off', 'medium', 'high')
|
||||
language: Language restriction (e.g., 'lang_en')
|
||||
country: Country restriction (e.g., 'countryUS')
|
||||
search_type: Type of search ('image' for image search, '' for web)
|
||||
|
||||
Returns:
|
||||
CSEResponse with results or error information
|
||||
"""
|
||||
params = {
|
||||
'key': self.api_key,
|
||||
'cx': self.cse_id,
|
||||
'q': query,
|
||||
'start': start,
|
||||
'num': min(num, 10), # API max is 10
|
||||
'safe': safe,
|
||||
}
|
||||
|
||||
# Add search type for image search
|
||||
if search_type == 'image':
|
||||
params['searchType'] = 'image'
|
||||
|
||||
# Add optional parameters
|
||||
if language:
|
||||
# CSE uses 'lr' for language restrict
|
||||
params['lr'] = language
|
||||
if country:
|
||||
# CSE uses 'cr' for country restrict
|
||||
params['cr'] = country
|
||||
|
||||
try:
|
||||
response = self._client.get(CSE_API_URL, params=params)
|
||||
data = response.json()
|
||||
|
||||
# Check for API errors
|
||||
if 'error' in data:
|
||||
error_info = data['error']
|
||||
return CSEResponse(
|
||||
results=[],
|
||||
total_results='0',
|
||||
search_time=0.0,
|
||||
query=query,
|
||||
start_index=start,
|
||||
error=CSEError(
|
||||
code=error_info.get('code', 500),
|
||||
message=error_info.get('message', 'Unknown error')
|
||||
)
|
||||
)
|
||||
|
||||
# Parse successful response
|
||||
search_info = data.get('searchInformation', {})
|
||||
items = data.get('items', [])
|
||||
is_image = search_type == 'image'
|
||||
|
||||
results = []
|
||||
for item in items:
|
||||
# Extract image-specific data if present
|
||||
image_data = item.get('image', {})
|
||||
|
||||
results.append(CSEResult(
|
||||
title=item.get('title', ''),
|
||||
link=item.get('link', ''),
|
||||
snippet=item.get('snippet', ''),
|
||||
display_link=item.get('displayLink', ''),
|
||||
html_title=item.get('htmlTitle'),
|
||||
html_snippet=item.get('htmlSnippet'),
|
||||
# Image fields
|
||||
image_url=item.get('link') if is_image else None,
|
||||
thumbnail_url=image_data.get('thumbnailLink'),
|
||||
image_width=image_data.get('width'),
|
||||
image_height=image_data.get('height'),
|
||||
context_link=image_data.get('contextLink')
|
||||
))
|
||||
|
||||
return CSEResponse(
|
||||
results=results,
|
||||
total_results=search_info.get('totalResults', '0'),
|
||||
search_time=float(search_info.get('searchTime', 0)),
|
||||
query=query,
|
||||
start_index=start,
|
||||
is_image_search=is_image
|
||||
)
|
||||
|
||||
except httpx.TimeoutException:
|
||||
return CSEResponse(
|
||||
results=[],
|
||||
total_results='0',
|
||||
search_time=0.0,
|
||||
query=query,
|
||||
start_index=start,
|
||||
error=CSEError(code=408, message='Request timed out')
|
||||
)
|
||||
except httpx.RequestError as e:
|
||||
return CSEResponse(
|
||||
results=[],
|
||||
total_results='0',
|
||||
search_time=0.0,
|
||||
query=query,
|
||||
start_index=start,
|
||||
error=CSEError(code=500, message=f'Request failed: {str(e)}')
|
||||
)
|
||||
except Exception as e:
|
||||
return CSEResponse(
|
||||
results=[],
|
||||
total_results='0',
|
||||
search_time=0.0,
|
||||
query=query,
|
||||
start_index=start,
|
||||
error=CSEError(code=500, message=f'Unexpected error: {str(e)}')
|
||||
)
|
||||
|
||||
def close(self):
|
||||
"""Close the HTTP client"""
|
||||
self._client.close()
|
||||
|
||||
def __enter__(self):
|
||||
return self
|
||||
|
||||
def __exit__(self, *args):
|
||||
self.close()
|
||||
|
||||
|
||||
def cse_results_to_html(response: CSEResponse, query: str) -> str:
|
||||
"""Convert CSE API response to HTML matching Whoogle's result format
|
||||
|
||||
This generates HTML that mimics the structure expected by Whoogle's
|
||||
existing filter and result processing pipeline.
|
||||
|
||||
Args:
|
||||
response: CSEResponse from the API
|
||||
query: Original search query
|
||||
|
||||
Returns:
|
||||
HTML string formatted like Google search results
|
||||
"""
|
||||
if response.has_error:
|
||||
error = response.error
|
||||
if error.is_quota_exceeded:
|
||||
return _error_html(
|
||||
'API Quota Exceeded',
|
||||
'Your Google Custom Search API quota has been exceeded. '
|
||||
'Free tier allows 100 queries/day. Wait until midnight PT '
|
||||
'or enable billing in Google Cloud Console.'
|
||||
)
|
||||
elif error.is_invalid_key:
|
||||
return _error_html(
|
||||
'Invalid API Key',
|
||||
'Your Google Custom Search API key is invalid. '
|
||||
'Please check your API key and CSE ID in settings.'
|
||||
)
|
||||
else:
|
||||
return _error_html('Search Error', error.message)
|
||||
|
||||
if not response.has_results:
|
||||
return _no_results_html(query)
|
||||
|
||||
# Use different HTML structure for image vs web results
|
||||
if response.is_image_search:
|
||||
return _image_results_html(response, query)
|
||||
|
||||
# Build HTML results matching Whoogle's expected structure
|
||||
results_html = []
|
||||
|
||||
for result in response.results:
|
||||
# Escape HTML in content
|
||||
title = _escape_html(result.title)
|
||||
snippet = _escape_html(result.snippet)
|
||||
link = result.link
|
||||
display_link = _escape_html(result.display_link)
|
||||
|
||||
# Use HTML versions if available (they have bold tags for query terms)
|
||||
if result.html_title:
|
||||
title = result.html_title
|
||||
if result.html_snippet:
|
||||
snippet = result.html_snippet
|
||||
|
||||
# Match the structure used by Google/mock results
|
||||
result_html = f'''
|
||||
<div class="ZINbbc xpd O9g5cc uUPGi">
|
||||
<div class="kCrYT">
|
||||
<a href="{link}">
|
||||
<h3 class="BNeawe vvjwJb AP7Wnd">{title}</h3>
|
||||
<div class="BNeawe UPmit AP7Wnd luh4tb" style="color: var(--whoogle-result-url);">{display_link}</div>
|
||||
</a>
|
||||
</div>
|
||||
<div class="kCrYT">
|
||||
<div class="BNeawe s3v9rd AP7Wnd">
|
||||
<span class="VwiC3b">{snippet}</span>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
'''
|
||||
results_html.append(result_html)
|
||||
|
||||
# Build pagination if needed
|
||||
pagination_html = ''
|
||||
if int(response.total_results) > 10:
|
||||
pagination_html = _pagination_html(response.start_index, response.query)
|
||||
|
||||
# Wrap in expected structure
|
||||
# Add data-cse attribute to prevent collapse_sections from collapsing these results
|
||||
return f'''
|
||||
<html>
|
||||
<body>
|
||||
<div id="main" data-cse="true">
|
||||
<div id="cnt">
|
||||
<div id="rcnt">
|
||||
<div id="center_col">
|
||||
<div id="res">
|
||||
<div id="search">
|
||||
<div id="rso">
|
||||
{''.join(results_html)}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
{pagination_html}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
||||
'''
|
||||
|
||||
|
||||
def _escape_html(text: str) -> str:
|
||||
"""Escape HTML special characters"""
|
||||
if not text:
|
||||
return ''
|
||||
return (text
|
||||
.replace('&', '&')
|
||||
.replace('<', '<')
|
||||
.replace('>', '>')
|
||||
.replace('"', '"')
|
||||
.replace("'", '''))
|
||||
|
||||
|
||||
def _error_html(title: str, message: str) -> str:
|
||||
"""Generate error HTML"""
|
||||
return f'''
|
||||
<html>
|
||||
<body>
|
||||
<div id="main">
|
||||
<div style="padding: 20px; text-align: center;">
|
||||
<h2 style="color: #d93025;">{_escape_html(title)}</h2>
|
||||
<p>{_escape_html(message)}</p>
|
||||
</div>
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
||||
'''
|
||||
|
||||
|
||||
def _no_results_html(query: str) -> str:
|
||||
"""Generate no results HTML"""
|
||||
return f'''
|
||||
<html>
|
||||
<body>
|
||||
<div id="main">
|
||||
<div style="padding: 20px;">
|
||||
<p>No results found for <b>{_escape_html(query)}</b></p>
|
||||
</div>
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
||||
'''
|
||||
|
||||
|
||||
def _image_results_html(response: CSEResponse, query: str) -> str:
|
||||
"""Generate HTML for image search results using the imageresults template
|
||||
|
||||
Args:
|
||||
response: CSEResponse with image results
|
||||
query: Original search query
|
||||
|
||||
Returns:
|
||||
HTML string formatted for image results display
|
||||
"""
|
||||
# Convert CSE results to the format expected by imageresults.html template
|
||||
results = []
|
||||
for result in response.results:
|
||||
image_url = result.image_url or result.link
|
||||
thumbnail_url = result.thumbnail_url or image_url
|
||||
web_page = result.context_link or result.link
|
||||
domain = urlparse(web_page).netloc if web_page else result.display_link
|
||||
|
||||
results.append({
|
||||
'domain': domain,
|
||||
'img_url': image_url,
|
||||
'web_page': web_page,
|
||||
'img_tbn': thumbnail_url
|
||||
})
|
||||
|
||||
# Build pagination link if needed
|
||||
next_link = None
|
||||
if int(response.total_results) > response.start_index + len(response.results) - 1:
|
||||
next_start = response.start_index + 10
|
||||
next_link = f'search?q={query}&tbm=isch&start={next_start}'
|
||||
|
||||
# Use the same template as regular image results
|
||||
return render_template(
|
||||
'imageresults.html',
|
||||
length=len(results),
|
||||
results=results,
|
||||
view_label="View Image",
|
||||
next_link=next_link
|
||||
)
|
||||
|
||||
|
||||
def _pagination_html(current_start: int, query: str) -> str:
|
||||
"""Generate pagination links"""
|
||||
# CSE API uses 1-based indexing, 10 results per page
|
||||
current_page = (current_start - 1) // 10 + 1
|
||||
|
||||
prev_link = ''
|
||||
next_link = ''
|
||||
|
||||
if current_page > 1:
|
||||
prev_start = (current_page - 2) * 10 + 1
|
||||
prev_link = f'<a href="search?q={query}&start={prev_start}">Previous</a>'
|
||||
|
||||
next_start = current_page * 10 + 1
|
||||
next_link = f'<a href="search?q={query}&start={next_start}">Next</a>'
|
||||
|
||||
return f'''
|
||||
<div id="foot" style="text-align: center; padding: 20px;">
|
||||
{prev_link}
|
||||
<span style="margin: 0 20px;">Page {current_page}</span>
|
||||
{next_link}
|
||||
</div>
|
||||
'''
|
||||
|
|
@ -257,6 +257,30 @@
|
|||
<input type="checkbox" name="show_user_agent"
|
||||
id="config-show-user-agent" {{ 'checked' if config.show_user_agent else '' }}>
|
||||
</div>
|
||||
<!-- Google Custom Search Engine (BYOK) Settings -->
|
||||
<div class="config-div config-div-cse-header" style="margin-top: 20px; border-top: 1px solid var(--result-bg); padding-top: 15px;">
|
||||
<strong>Google Custom Search (BYOK)</strong>
|
||||
<div><span class="info-text"> — <a href="https://github.com/benbusby/whoogle-search#google-custom-search-byok">Setup Guide</a></span></div>
|
||||
</div>
|
||||
<div class="config-div config-div-use-cse">
|
||||
<label for="config-use-cse">Use Custom Search API: </label>
|
||||
<input type="checkbox" name="use_cse" id="config-use-cse" {{ 'checked' if config.use_cse else '' }}>
|
||||
<div><span class="info-text"> — Enable to use your own Google API key (100 free queries/day)</span></div>
|
||||
</div>
|
||||
<div class="config-div config-div-cse-api-key">
|
||||
<label for="config-cse-api-key">CSE API Key: </label>
|
||||
<input type="password" name="cse_api_key" id="config-cse-api-key"
|
||||
value="{{ config.cse_api_key }}"
|
||||
placeholder="AIza..."
|
||||
autocomplete="off">
|
||||
</div>
|
||||
<div class="config-div config-div-cse-id">
|
||||
<label for="config-cse-id">CSE ID: </label>
|
||||
<input type="text" name="cse_id" id="config-cse-id"
|
||||
value="{{ config.cse_id }}"
|
||||
placeholder="abc123..."
|
||||
autocomplete="off">
|
||||
</div>
|
||||
<div class="config-div config-div-root-url">
|
||||
<label for="config-url">{{ translation['config-url'] }}: </label>
|
||||
<input type="text" name="url" id="config-url" value="{{ config.url }}">
|
||||
|
|
|
|||
|
|
@ -5,6 +5,7 @@ from app.filter import Filter
|
|||
from app.request import gen_query
|
||||
from app.utils.misc import get_proxy_host_url
|
||||
from app.utils.results import get_first_link
|
||||
from app.services.cse_client import CSEClient, cse_results_to_html
|
||||
from bs4 import BeautifulSoup as bsoup
|
||||
from cryptography.fernet import Fernet, InvalidToken
|
||||
from flask import g
|
||||
|
|
@ -142,6 +143,89 @@ class Search:
|
|||
config=self.config,
|
||||
query=self.query,
|
||||
page_url=self.request.url)
|
||||
|
||||
# Check if CSE (Custom Search Engine) should be used
|
||||
use_cse = (
|
||||
self.config.use_cse and
|
||||
self.config.cse_api_key and
|
||||
self.config.cse_id
|
||||
)
|
||||
|
||||
if use_cse:
|
||||
# Use Google Custom Search API
|
||||
return self._generate_cse_response(content_filter, root_url, mobile)
|
||||
|
||||
# Default: Use traditional scraping method
|
||||
return self._generate_scrape_response(content_filter, root_url, mobile)
|
||||
|
||||
def _generate_cse_response(self, content_filter: Filter, root_url: str, mobile: bool) -> str:
|
||||
"""Generate response using Google Custom Search API
|
||||
|
||||
Args:
|
||||
content_filter: Filter instance for processing results
|
||||
root_url: Root URL of the instance
|
||||
mobile: Whether this is a mobile request
|
||||
|
||||
Returns:
|
||||
str: HTML response string
|
||||
"""
|
||||
# Get pagination start index from request params
|
||||
start = int(self.request_params.get('start', 1))
|
||||
|
||||
# Determine safe search setting
|
||||
safe = 'high' if self.config.safe else 'off'
|
||||
|
||||
# Determine search type (web or image)
|
||||
# tbm=isch or udm=2 indicates image search
|
||||
search_type = ''
|
||||
if self.search_type == 'isch' or self.request_params.get('udm') == '2':
|
||||
search_type = 'image'
|
||||
|
||||
# Create CSE client and perform search
|
||||
with CSEClient(
|
||||
api_key=self.config.cse_api_key,
|
||||
cse_id=self.config.cse_id
|
||||
) as client:
|
||||
response = client.search(
|
||||
query=self.query,
|
||||
start=start,
|
||||
safe=safe,
|
||||
language=self.config.lang_search,
|
||||
country=self.config.country,
|
||||
search_type=search_type
|
||||
)
|
||||
|
||||
# Convert CSE response to HTML
|
||||
html_content = cse_results_to_html(response, self.query)
|
||||
|
||||
# Store full query for tabs
|
||||
self.full_query = self.query
|
||||
|
||||
# Parse and filter the HTML
|
||||
html_soup = bsoup(html_content, 'html.parser')
|
||||
|
||||
# Handle feeling lucky
|
||||
if self.feeling_lucky:
|
||||
if response.has_results and response.results:
|
||||
return response.results[0].link
|
||||
self.feeling_lucky = False
|
||||
|
||||
# Apply content filter (encrypts links, applies CSS, etc.)
|
||||
formatted_results = content_filter.clean(html_soup)
|
||||
|
||||
return str(formatted_results)
|
||||
|
||||
def _generate_scrape_response(self, content_filter: Filter, root_url: str, mobile: bool) -> str:
|
||||
"""Generate response using traditional HTML scraping
|
||||
|
||||
Args:
|
||||
content_filter: Filter instance for processing results
|
||||
root_url: Root URL of the instance
|
||||
mobile: Whether this is a mobile request
|
||||
|
||||
Returns:
|
||||
str: HTML response string
|
||||
"""
|
||||
full_query = gen_query(self.query,
|
||||
self.request_params,
|
||||
self.config)
|
||||
|
|
|
|||
|
|
@ -4,5 +4,5 @@ optional_dev_tag = ''
|
|||
if os.getenv('DEV_BUILD'):
|
||||
optional_dev_tag = '.dev' + os.getenv('DEV_BUILD')
|
||||
|
||||
__version__ = '1.2.1' + optional_dev_tag
|
||||
__version__ = '1.2.2' + optional_dev_tag
|
||||
|
||||
|
|
|
|||
|
|
@ -30,5 +30,5 @@ h11>=0.16.0
|
|||
validators==0.35.0
|
||||
waitress==3.0.2
|
||||
wcwidth==0.2.14
|
||||
Werkzeug==3.1.3
|
||||
Werkzeug==3.1.4
|
||||
python-dotenv==1.1.1
|
||||
|
|
|
|||
|
|
@ -72,9 +72,6 @@
|
|||
# Remove everything except basic result cards from all search queries
|
||||
#WHOOGLE_MINIMAL=0
|
||||
|
||||
# Set the number of results per page
|
||||
#WHOOGLE_RESULTS_PER_PAGE=10
|
||||
|
||||
# Controls visibility of autocomplete/search suggestions
|
||||
#WHOOGLE_AUTOCOMPLETE=1
|
||||
|
||||
|
|
|
|||
Loading…
Reference in a new issue