mirror of
https://github.com/benbusby/whoogle-search.git
synced 2026-03-11 08:54:34 +00:00
Merge pull request #1286 from benbusby/updates
Updates, Features, and Bugfixes; Oh My!
This commit is contained in:
commit
2949510d68
15 changed files with 1149 additions and 107 deletions
111
README.md
111
README.md
|
|
@ -40,8 +40,9 @@ Contents
|
||||||
1. [Arch/AUR](#arch-linux--arch-based-distributions)
|
1. [Arch/AUR](#arch-linux--arch-based-distributions)
|
||||||
1. [Helm/Kubernetes](#helm-chart-for-kubernetes)
|
1. [Helm/Kubernetes](#helm-chart-for-kubernetes)
|
||||||
4. [Environment Variables and Configuration](#environment-variables)
|
4. [Environment Variables and Configuration](#environment-variables)
|
||||||
5. [Usage](#usage)
|
5. [Google Custom Search (BYOK)](#google-custom-search-byok)
|
||||||
6. [Extra Steps](#extra-steps)
|
6. [Usage](#usage)
|
||||||
|
7. [Extra Steps](#extra-steps)
|
||||||
1. [Set Primary Search Engine](#set-whoogle-as-your-primary-search-engine)
|
1. [Set Primary Search Engine](#set-whoogle-as-your-primary-search-engine)
|
||||||
2. [Custom Redirecting](#custom-redirecting)
|
2. [Custom Redirecting](#custom-redirecting)
|
||||||
2. [Custom Bangs](#custom-bangs)
|
2. [Custom Bangs](#custom-bangs)
|
||||||
|
|
@ -50,10 +51,10 @@ Contents
|
||||||
5. [Using with Firefox Containers](#using-with-firefox-containers)
|
5. [Using with Firefox Containers](#using-with-firefox-containers)
|
||||||
6. [Reverse Proxying](#reverse-proxying)
|
6. [Reverse Proxying](#reverse-proxying)
|
||||||
1. [Nginx](#nginx)
|
1. [Nginx](#nginx)
|
||||||
7. [Contributing](#contributing)
|
8. [Contributing](#contributing)
|
||||||
8. [FAQ](#faq)
|
9. [FAQ](#faq)
|
||||||
9. [Public Instances](#public-instances)
|
10. [Public Instances](#public-instances)
|
||||||
10. [Screenshots](#screenshots)
|
11. [Screenshots](#screenshots)
|
||||||
|
|
||||||
## Features
|
## Features
|
||||||
- No ads or sponsored content
|
- No ads or sponsored content
|
||||||
|
|
@ -475,7 +476,6 @@ There are a few optional environment variables available for customizing a Whoog
|
||||||
| WHOOGLE_AUTOCOMPLETE | Controls visibility of autocomplete/search suggestions. Default on -- use '0' to disable. |
|
| WHOOGLE_AUTOCOMPLETE | Controls visibility of autocomplete/search suggestions. Default on -- use '0' to disable. |
|
||||||
| WHOOGLE_MINIMAL | Remove everything except basic result cards from all search queries. |
|
| WHOOGLE_MINIMAL | Remove everything except basic result cards from all search queries. |
|
||||||
| WHOOGLE_CSP | Sets a default set of 'Content-Security-Policy' headers |
|
| WHOOGLE_CSP | Sets a default set of 'Content-Security-Policy' headers |
|
||||||
| WHOOGLE_RESULTS_PER_PAGE | Set the number of results per page |
|
|
||||||
| WHOOGLE_TOR_SERVICE | Enable/disable the Tor service on startup. Default on -- use '0' to disable. |
|
| WHOOGLE_TOR_SERVICE | Enable/disable the Tor service on startup. Default on -- use '0' to disable. |
|
||||||
| WHOOGLE_TOR_USE_PASS | Use password authentication for tor control port. |
|
| WHOOGLE_TOR_USE_PASS | Use password authentication for tor control port. |
|
||||||
| WHOOGLE_TOR_CONF | The absolute path to the config file containing the password for the tor control port. Default: ./misc/tor/control.conf WHOOGLE_TOR_PASS must be 1 for this to work.|
|
| WHOOGLE_TOR_CONF | The absolute path to the config file containing the password for the tor control port. Default: ./misc/tor/control.conf WHOOGLE_TOR_PASS must be 1 for this to work.|
|
||||||
|
|
@ -512,6 +512,103 @@ These environment variables allow setting default config values, but can be over
|
||||||
| WHOOGLE_CONFIG_ANON_VIEW | Include the "anonymous view" option for each search result |
|
| WHOOGLE_CONFIG_ANON_VIEW | Include the "anonymous view" option for each search result |
|
||||||
| WHOOGLE_CONFIG_SHOW_USER_AGENT | Display the User Agent string used for search in results footer |
|
| WHOOGLE_CONFIG_SHOW_USER_AGENT | Display the User Agent string used for search in results footer |
|
||||||
|
|
||||||
|
### Google Custom Search (BYOK) Environment Variables
|
||||||
|
|
||||||
|
These environment variables configure the "Bring Your Own Key" feature for Google Custom Search API:
|
||||||
|
|
||||||
|
| Variable | Description |
|
||||||
|
| -------------------- | ----------------------------------------------------------------------------------------- |
|
||||||
|
| WHOOGLE_CSE_API_KEY | Your Google API key with Custom Search API enabled |
|
||||||
|
| WHOOGLE_CSE_ID | Your Custom Search Engine ID (cx parameter) |
|
||||||
|
| WHOOGLE_USE_CSE | Enable Custom Search API by default (set to '1' to enable) |
|
||||||
|
|
||||||
|
## Google Custom Search (BYOK)
|
||||||
|
|
||||||
|
If Google blocks traditional search scraping (captchas, IP bans), you can use your own Google Custom Search Engine credentials as a fallback. This uses Google's official API with your own quota.
|
||||||
|
|
||||||
|
### Why Use This?
|
||||||
|
|
||||||
|
- **Reliability**: Official API never gets blocked or rate-limited (within quota)
|
||||||
|
- **Speed**: Direct JSON responses are faster than HTML scraping
|
||||||
|
- **Fallback**: Works when all scraping workarounds fail
|
||||||
|
- **Privacy**: Your searches still don't go through third parties—they go directly to Google with your own API key
|
||||||
|
|
||||||
|
### Limitations vs Standard Whoogle
|
||||||
|
|
||||||
|
| Feature | Standard Scraping | CSE API |
|
||||||
|
|------------------|--------------------------|---------------------|
|
||||||
|
| Daily limit | None (until blocked) | 100 free, then paid |
|
||||||
|
| Image search | ✅ Full support | ✅ Supported |
|
||||||
|
| News/Videos tabs | ✅ | ❌ Web results only |
|
||||||
|
| Speed | Slower (HTML parsing) | Faster (JSON) |
|
||||||
|
| Reliability | Can be blocked | Always works |
|
||||||
|
|
||||||
|
### Setup Steps
|
||||||
|
|
||||||
|
#### 1. Create a Custom Search Engine
|
||||||
|
1. Go to [Programmable Search Engine](https://programmablesearchengine.google.com/controlpanel/all)
|
||||||
|
2. Click **"Add"** to create a new search engine
|
||||||
|
3. Under "What to search?", select **"Search the entire web"**
|
||||||
|
4. Give it a name (e.g., "My Whoogle CSE")
|
||||||
|
5. Click **"Create"**
|
||||||
|
6. Copy your **Search Engine ID**
|
||||||
|
|
||||||
|
#### 2. Get an API Key
|
||||||
|
1. Go to [Google Cloud Console](https://console.cloud.google.com/)
|
||||||
|
2. Create a new project or select an existing one
|
||||||
|
3. Go to **APIs & Services** → **Library**
|
||||||
|
4. Search for **"Custom Search API"** and click **Enable**
|
||||||
|
5. Go to **APIs & Services** → **Credentials**
|
||||||
|
6. Click **"Create Credentials"** → **"API Key"**
|
||||||
|
7. Copy your API key (looks like `AIza...`)
|
||||||
|
|
||||||
|
#### 3. (Recommended) Restrict Your API Key
|
||||||
|
To prevent misuse if your key is exposed:
|
||||||
|
1. Click on your API key in Credentials
|
||||||
|
2. Under **"API restrictions"**, select **"Restrict key"**
|
||||||
|
3. Choose only **"Custom Search API"**
|
||||||
|
4. Under **"Application restrictions"**, consider adding IP restrictions if using on a server
|
||||||
|
5. Click **Save**
|
||||||
|
|
||||||
|
#### 4. Configure Whoogle
|
||||||
|
|
||||||
|
**Option A: Via Settings UI**
|
||||||
|
1. Open your Whoogle instance
|
||||||
|
2. Click the **Config** button
|
||||||
|
3. Scroll to "Google Custom Search (BYOK)" section
|
||||||
|
4. Enter your API Key and CSE ID
|
||||||
|
5. Check "Use Custom Search API"
|
||||||
|
6. Click **Apply**
|
||||||
|
|
||||||
|
**Option B: Via Environment Variables**
|
||||||
|
```bash
|
||||||
|
WHOOGLE_CSE_API_KEY=AIza...
|
||||||
|
WHOOGLE_CSE_ID=23f...
|
||||||
|
WHOOGLE_USE_CSE=1
|
||||||
|
```
|
||||||
|
|
||||||
|
### Pricing & Avoiding Charges
|
||||||
|
|
||||||
|
| Tier | Queries | Cost |
|
||||||
|
|------|------------------|-----------------------|
|
||||||
|
| Free | 100/day | $0 |
|
||||||
|
| Paid | Up to 10,000/day | $5 per 1,000 queries |
|
||||||
|
|
||||||
|
**⚠️ To avoid unexpected charges:**
|
||||||
|
|
||||||
|
1. **Don't add a payment method** to Google Cloud (safest option—API stops at 100/day)
|
||||||
|
2. **Set a billing budget alert**: [Billing → Budgets & Alerts](https://console.cloud.google.com/billing/budgets)
|
||||||
|
3. **Cap API usage**: APIs & Services → Custom Search API → Quotas → Set "Queries per day" to 100
|
||||||
|
4. **Monitor usage**: APIs & Services → Custom Search API → Metrics
|
||||||
|
|
||||||
|
### Troubleshooting
|
||||||
|
|
||||||
|
| Error | Cause | Solution |
|
||||||
|
|---------------------|---------------------------|-----------------------------------------------------------------|
|
||||||
|
| "API key not valid" | Invalid or restricted key | Check key in Cloud Console, ensure Custom Search API is enabled |
|
||||||
|
| "Quota exceeded" | Hit 100/day limit | Wait until midnight PT, or enable billing |
|
||||||
|
| "Invalid CSE ID" | Wrong cx parameter | Copy ID from Programmable Search Engine control panel |
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
Same as most search engines, with the exception of filtering by time range.
|
Same as most search engines, with the exception of filtering by time range.
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -12,19 +12,19 @@ from flask import Flask
|
||||||
import json
|
import json
|
||||||
import logging.config
|
import logging.config
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
from stem import Signal
|
from stem import Signal
|
||||||
import threading
|
import threading
|
||||||
import warnings
|
import warnings
|
||||||
|
|
||||||
from werkzeug.middleware.proxy_fix import ProxyFix
|
from werkzeug.middleware.proxy_fix import ProxyFix
|
||||||
|
|
||||||
from app.utils.misc import read_config_bool
|
|
||||||
from app.services.http_client import HttpxClient
|
from app.services.http_client import HttpxClient
|
||||||
from app.services.provider import close_all_clients
|
from app.services.provider import close_all_clients
|
||||||
from app.version import __version__
|
from app.version import __version__
|
||||||
|
|
||||||
app = Flask(__name__, static_folder=os.path.dirname(
|
app = Flask(__name__, static_folder=os.path.join(
|
||||||
os.path.abspath(__file__)) + '/static')
|
os.path.dirname(os.path.abspath(__file__)), 'static'))
|
||||||
|
|
||||||
app.wsgi_app = ProxyFix(app.wsgi_app)
|
app.wsgi_app = ProxyFix(app.wsgi_app)
|
||||||
|
|
||||||
|
|
@ -76,7 +76,10 @@ app.config['CONFIG_DISABLE'] = read_config_bool('WHOOGLE_CONFIG_DISABLE')
|
||||||
app.config['SESSION_FILE_DIR'] = os.path.join(
|
app.config['SESSION_FILE_DIR'] = os.path.join(
|
||||||
app.config['CONFIG_PATH'],
|
app.config['CONFIG_PATH'],
|
||||||
'session')
|
'session')
|
||||||
app.config['MAX_SESSION_SIZE'] = 4000 # Sessions won't exceed 4KB
|
# Maximum session file size in bytes (4KB limit to prevent abuse and disk exhaustion)
|
||||||
|
# Session files larger than this are ignored during cleanup to avoid processing
|
||||||
|
# potentially malicious or corrupted files
|
||||||
|
app.config['MAX_SESSION_SIZE'] = 4000
|
||||||
app.config['BANG_PATH'] = os.getenv(
|
app.config['BANG_PATH'] = os.getenv(
|
||||||
'CONFIG_VOLUME',
|
'CONFIG_VOLUME',
|
||||||
os.path.join(app.config['STATIC_FOLDER'], 'bangs'))
|
os.path.join(app.config['STATIC_FOLDER'], 'bangs'))
|
||||||
|
|
@ -118,18 +121,53 @@ except Exception as e:
|
||||||
print(f"Warning: Could not initialize UA pool: {e}")
|
print(f"Warning: Could not initialize UA pool: {e}")
|
||||||
app.config['UA_POOL'] = []
|
app.config['UA_POOL'] = []
|
||||||
|
|
||||||
# Session values
|
# Session values - Secret key management
|
||||||
app_key_path = os.path.join(app.config['CONFIG_PATH'], 'whoogle.key')
|
# Priority: environment variable → file → generate new
|
||||||
if os.path.exists(app_key_path):
|
def get_secret_key():
|
||||||
|
"""Load or generate secret key with validation.
|
||||||
|
|
||||||
|
Priority order:
|
||||||
|
1. WHOOGLE_SECRET_KEY environment variable
|
||||||
|
2. Existing key file
|
||||||
|
3. Generate new key and save to file
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
str: Valid secret key for Flask sessions
|
||||||
|
"""
|
||||||
|
# Check environment variable first
|
||||||
|
env_key = os.getenv('WHOOGLE_SECRET_KEY', '').strip()
|
||||||
|
if env_key:
|
||||||
|
# Validate env key has minimum length
|
||||||
|
if len(env_key) >= 32:
|
||||||
|
return env_key
|
||||||
|
else:
|
||||||
|
print(f"Warning: WHOOGLE_SECRET_KEY too short ({len(env_key)} chars, need 32+). Using file/generated key instead.", file=sys.stderr)
|
||||||
|
|
||||||
|
# Check file-based key
|
||||||
|
app_key_path = os.path.join(app.config['CONFIG_PATH'], 'whoogle.key')
|
||||||
|
if os.path.exists(app_key_path):
|
||||||
|
try:
|
||||||
|
with open(app_key_path, 'r', encoding='utf-8') as f:
|
||||||
|
key = f.read().strip()
|
||||||
|
# Validate file key
|
||||||
|
if len(key) >= 32:
|
||||||
|
return key
|
||||||
|
else:
|
||||||
|
print(f"Warning: Key file too short, regenerating", file=sys.stderr)
|
||||||
|
except (PermissionError, IOError) as e:
|
||||||
|
print(f"Warning: Could not read key file: {e}", file=sys.stderr)
|
||||||
|
|
||||||
|
# Generate new key
|
||||||
|
new_key = str(b64encode(os.urandom(32)))
|
||||||
try:
|
try:
|
||||||
with open(app_key_path, 'r', encoding='utf-8') as f:
|
with open(app_key_path, 'w', encoding='utf-8') as key_file:
|
||||||
app.config['SECRET_KEY'] = f.read()
|
key_file.write(new_key)
|
||||||
except PermissionError:
|
except (PermissionError, IOError) as e:
|
||||||
app.config['SECRET_KEY'] = str(b64encode(os.urandom(32)))
|
print(f"Warning: Could not save key file: {e}. Key will not persist across restarts.", file=sys.stderr)
|
||||||
else:
|
|
||||||
app.config['SECRET_KEY'] = str(b64encode(os.urandom(32)))
|
return new_key
|
||||||
with open(app_key_path, 'w', encoding='utf-8') as key_file:
|
|
||||||
key_file.write(app.config['SECRET_KEY'])
|
app.config['SECRET_KEY'] = get_secret_key()
|
||||||
app.config['PERMANENT_SESSION_LIFETIME'] = timedelta(days=365)
|
app.config['PERMANENT_SESSION_LIFETIME'] = timedelta(days=365)
|
||||||
|
|
||||||
# NOTE: SESSION_COOKIE_SAMESITE must be set to 'lax' to allow the user's
|
# NOTE: SESSION_COOKIE_SAMESITE must be set to 'lax' to allow the user's
|
||||||
|
|
|
||||||
250
app/filter.py
250
app/filter.py
|
|
@ -5,7 +5,8 @@ from cryptography.fernet import Fernet
|
||||||
from flask import render_template
|
from flask import render_template
|
||||||
import html
|
import html
|
||||||
import urllib.parse as urlparse
|
import urllib.parse as urlparse
|
||||||
from urllib.parse import parse_qs
|
import os
|
||||||
|
from urllib.parse import parse_qs, urlencode, urlunparse
|
||||||
import re
|
import re
|
||||||
|
|
||||||
from app.models.g_classes import GClasses
|
from app.models.g_classes import GClasses
|
||||||
|
|
@ -111,8 +112,10 @@ def clean_css(css: str, page_url: str) -> str:
|
||||||
|
|
||||||
|
|
||||||
class Filter:
|
class Filter:
|
||||||
# Limit used for determining if a result is a "regular" result or a list
|
# Minimum number of child div elements that indicates a collapsible section
|
||||||
# type result (such as "people also asked", "related searches", etc)
|
# Regular search results typically have fewer child divs (< 7)
|
||||||
|
# Special sections like "People also ask", "Related searches" have more (>= 7)
|
||||||
|
# This threshold helps identify and collapse these extended result sections
|
||||||
RESULT_CHILD_LIMIT = 7
|
RESULT_CHILD_LIMIT = 7
|
||||||
|
|
||||||
def __init__(
|
def __init__(
|
||||||
|
|
@ -157,6 +160,7 @@ class Filter:
|
||||||
self.soup = soup
|
self.soup = soup
|
||||||
self.main_divs = self.soup.find('div', {'id': 'main'})
|
self.main_divs = self.soup.find('div', {'id': 'main'})
|
||||||
self.remove_ads()
|
self.remove_ads()
|
||||||
|
self.remove_ai_overview()
|
||||||
self.remove_block_titles()
|
self.remove_block_titles()
|
||||||
self.remove_block_url()
|
self.remove_block_url()
|
||||||
self.collapse_sections()
|
self.collapse_sections()
|
||||||
|
|
@ -206,6 +210,9 @@ class Filter:
|
||||||
header = self.soup.find('header')
|
header = self.soup.find('header')
|
||||||
if header:
|
if header:
|
||||||
header.decompose()
|
header.decompose()
|
||||||
|
# Remove broken "Dark theme" toggle snippets that occasionally slip
|
||||||
|
# into the footer.
|
||||||
|
self.remove_dark_theme_toggle(self.soup)
|
||||||
self.remove_site_blocks(self.soup)
|
self.remove_site_blocks(self.soup)
|
||||||
return self.soup
|
return self.soup
|
||||||
|
|
||||||
|
|
@ -215,7 +222,7 @@ class Filter:
|
||||||
Returns:
|
Returns:
|
||||||
None (The soup object is modified directly)
|
None (The soup object is modified directly)
|
||||||
"""
|
"""
|
||||||
if not div:
|
if not div or not isinstance(div, Tag):
|
||||||
return
|
return
|
||||||
|
|
||||||
for d in div.find_all('div', recursive=True):
|
for d in div.find_all('div', recursive=True):
|
||||||
|
|
@ -290,6 +297,22 @@ class Filter:
|
||||||
if GClasses.result_class_a in p_cls:
|
if GClasses.result_class_a in p_cls:
|
||||||
break
|
break
|
||||||
|
|
||||||
|
def remove_dark_theme_toggle(self, soup: BeautifulSoup) -> None:
|
||||||
|
"""Removes stray Dark theme toggle/link fragments that can appear
|
||||||
|
in the footer."""
|
||||||
|
for node in soup.find_all(string=re.compile(r'Dark theme', re.I)):
|
||||||
|
try:
|
||||||
|
parent = node.find_parent(
|
||||||
|
lambda tag: tag.name in ['div', 'span', 'p', 'a', 'li',
|
||||||
|
'section'])
|
||||||
|
target = parent or node.parent
|
||||||
|
if target:
|
||||||
|
target.decompose()
|
||||||
|
else:
|
||||||
|
node.extract()
|
||||||
|
except Exception:
|
||||||
|
continue
|
||||||
|
|
||||||
def remove_site_blocks(self, soup) -> None:
|
def remove_site_blocks(self, soup) -> None:
|
||||||
if not self.config.block or not soup.body:
|
if not self.config.block or not soup.body:
|
||||||
return
|
return
|
||||||
|
|
@ -301,6 +324,48 @@ class Filter:
|
||||||
result.string.replace_with(result.string.replace(
|
result.string.replace_with(result.string.replace(
|
||||||
search_string, ''))
|
search_string, ''))
|
||||||
|
|
||||||
|
def remove_ai_overview(self) -> None:
|
||||||
|
"""Removes Google's AI Overview/SGE results from search results
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
None (The soup object is modified directly)
|
||||||
|
"""
|
||||||
|
if not self.main_divs:
|
||||||
|
return
|
||||||
|
|
||||||
|
# Patterns that identify AI Overview sections
|
||||||
|
ai_patterns = [
|
||||||
|
'AI Overview',
|
||||||
|
'AI responses may include mistakes',
|
||||||
|
]
|
||||||
|
|
||||||
|
# Result div classes - check both original Google classes and mapped ones
|
||||||
|
# since this runs before CSS class replacement
|
||||||
|
result_classes = [GClasses.result_class_a] # 'ZINbbc'
|
||||||
|
result_classes.extend(GClasses.result_classes.get(
|
||||||
|
GClasses.result_class_a, [])) # ['Gx5Zad']
|
||||||
|
|
||||||
|
# Collect divs to remove first to avoid modifying while iterating
|
||||||
|
divs_to_remove = []
|
||||||
|
|
||||||
|
for div in self.main_divs.find_all('div', recursive=True):
|
||||||
|
# Check if this div or its children contain AI Overview markers
|
||||||
|
div_text = div.get_text()
|
||||||
|
if any(pattern in div_text for pattern in ai_patterns):
|
||||||
|
# Walk up to find the top-level result div
|
||||||
|
parent = div
|
||||||
|
while parent:
|
||||||
|
p_cls = parent.attrs.get('class') or []
|
||||||
|
if any(rc in p_cls for rc in result_classes):
|
||||||
|
if parent not in divs_to_remove:
|
||||||
|
divs_to_remove.append(parent)
|
||||||
|
break
|
||||||
|
parent = parent.parent
|
||||||
|
|
||||||
|
# Remove collected divs
|
||||||
|
for div in divs_to_remove:
|
||||||
|
div.decompose()
|
||||||
|
|
||||||
def remove_ads(self) -> None:
|
def remove_ads(self) -> None:
|
||||||
"""Removes ads found in the list of search result divs
|
"""Removes ads found in the list of search result divs
|
||||||
|
|
||||||
|
|
@ -372,6 +437,11 @@ class Filter:
|
||||||
if not self.main_divs:
|
if not self.main_divs:
|
||||||
return
|
return
|
||||||
|
|
||||||
|
# Skip collapsing for CSE (Custom Search Engine) results
|
||||||
|
# CSE results have a data-cse attribute on the main container
|
||||||
|
if self.soup.find(attrs={'data-cse': 'true'}):
|
||||||
|
return
|
||||||
|
|
||||||
# Loop through results and check for the number of child divs in each
|
# Loop through results and check for the number of child divs in each
|
||||||
for result in self.main_divs.find_all():
|
for result in self.main_divs.find_all():
|
||||||
result_children = pull_child_divs(result)
|
result_children = pull_child_divs(result)
|
||||||
|
|
@ -529,10 +599,32 @@ class Filter:
|
||||||
)
|
)
|
||||||
css = f"{css_html_tag}{css}"
|
css = f"{css_html_tag}{css}"
|
||||||
css = re.sub('body{(.*?)}',
|
css = re.sub('body{(.*?)}',
|
||||||
'body{padding:0 8px;margin:0 auto;max-width:736px;}',
|
'body{padding:0 12px;margin:0 auto;max-width:1200px;}',
|
||||||
css)
|
css)
|
||||||
style.string = css
|
style.string = css
|
||||||
|
|
||||||
|
# Normalize the max width between result types so the page doesn't
|
||||||
|
# jump in size when switching tabs.
|
||||||
|
if not self.mobile:
|
||||||
|
max_width_css = (
|
||||||
|
'body, #cnt, #center_col, .main, .e9EfHf, #searchform, '
|
||||||
|
'.GyAeWb, .s6JM6d {'
|
||||||
|
'max-width:1200px;'
|
||||||
|
'margin:0 auto;'
|
||||||
|
'padding-left:12px;'
|
||||||
|
'padding-right:12px;'
|
||||||
|
'}'
|
||||||
|
)
|
||||||
|
# Build the style tag using a fresh soup to avoid cases where the
|
||||||
|
# current soup lacks the helper methods (e.g., non-root elements).
|
||||||
|
factory_soup = BeautifulSoup('', 'html.parser')
|
||||||
|
extra_style = factory_soup.new_tag('style')
|
||||||
|
extra_style.string = max_width_css
|
||||||
|
if self.soup.head:
|
||||||
|
self.soup.head.append(extra_style)
|
||||||
|
else:
|
||||||
|
self.soup.insert(0, extra_style)
|
||||||
|
|
||||||
def update_link(self, link: Tag) -> None:
|
def update_link(self, link: Tag) -> None:
|
||||||
"""Update internal link paths with encrypted path, otherwise remove
|
"""Update internal link paths with encrypted path, otherwise remove
|
||||||
unnecessary redirects and/or marketing params from the url
|
unnecessary redirects and/or marketing params from the url
|
||||||
|
|
@ -552,9 +644,6 @@ class Filter:
|
||||||
|
|
||||||
# Remove any elements that direct to unsupported Google pages
|
# Remove any elements that direct to unsupported Google pages
|
||||||
if any(url in link_netloc for url in unsupported_g_pages):
|
if any(url in link_netloc for url in unsupported_g_pages):
|
||||||
# FIXME: The "Shopping" tab requires further filtering (see #136)
|
|
||||||
# Temporarily removing all links to that tab for now.
|
|
||||||
|
|
||||||
# Replaces the /url google unsupported link to the direct url
|
# Replaces the /url google unsupported link to the direct url
|
||||||
link['href'] = link_netloc
|
link['href'] = link_netloc
|
||||||
parent = link.parent
|
parent = link.parent
|
||||||
|
|
@ -739,16 +828,113 @@ class Filter:
|
||||||
desc_node.replace_with(new_desc)
|
desc_node.replace_with(new_desc)
|
||||||
|
|
||||||
def view_image(self, soup) -> BeautifulSoup:
|
def view_image(self, soup) -> BeautifulSoup:
|
||||||
"""Replaces the soup with a new one that handles mobile results and
|
"""Parses image results from Google Images and rewrites them into the
|
||||||
adds the link of the image full res to the results.
|
lightweight Whoogle image results template.
|
||||||
|
|
||||||
Args:
|
Google now serves image results via the modern udm=2 endpoint, where
|
||||||
soup: A BeautifulSoup object containing the image mobile results.
|
the raw HTML contains only placeholder thumbnails. The actual image
|
||||||
|
URLs live inside serialized data blobs in script tags. We extract that
|
||||||
Returns:
|
data and pair it with the visible result cards.
|
||||||
BeautifulSoup: The new BeautifulSoup object
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
def _decode_url(url: str) -> str:
|
||||||
|
if not url:
|
||||||
|
return ''
|
||||||
|
# Decode common escaped characters found in the script blobs
|
||||||
|
return html.unescape(
|
||||||
|
url.replace('\\u003d', '=').replace('\\u0026', '&')
|
||||||
|
)
|
||||||
|
|
||||||
|
def _extract_image_data(modern_soup: BeautifulSoup) -> dict:
|
||||||
|
"""Extracts docid -> {img_url, img_tbn} from serialized scripts."""
|
||||||
|
scripts_text = ' '.join(
|
||||||
|
script.string for script in modern_soup.find_all('script')
|
||||||
|
if script.string
|
||||||
|
)
|
||||||
|
pattern = re.compile(
|
||||||
|
r'\[0,"(?P<docid>[^"]+)",\["(?P<thumb>https://encrypted-tbn[^"]+)"'
|
||||||
|
r'(?:,\d+,\d+)?\],\["(?P<full>https?://[^"]+?)"'
|
||||||
|
r'(?:,\d+,\d+)?\]',
|
||||||
|
re.DOTALL
|
||||||
|
)
|
||||||
|
results_map = {}
|
||||||
|
for match in pattern.finditer(scripts_text):
|
||||||
|
docid = match.group('docid')
|
||||||
|
thumb = _decode_url(match.group('thumb'))
|
||||||
|
full = _decode_url(match.group('full'))
|
||||||
|
results_map[docid] = {
|
||||||
|
'img_tbn': thumb,
|
||||||
|
'img_url': full
|
||||||
|
}
|
||||||
|
return results_map
|
||||||
|
|
||||||
|
def _parse_modern_results(modern_soup: BeautifulSoup) -> list:
|
||||||
|
cards = modern_soup.find_all(
|
||||||
|
'div',
|
||||||
|
attrs={
|
||||||
|
'data-attrid': 'images universal',
|
||||||
|
'data-docid': True
|
||||||
|
}
|
||||||
|
)
|
||||||
|
if not cards:
|
||||||
|
return []
|
||||||
|
|
||||||
|
meta_map = _extract_image_data(modern_soup)
|
||||||
|
parsed = []
|
||||||
|
seen = set()
|
||||||
|
|
||||||
|
for card in cards:
|
||||||
|
docid = card.get('data-docid')
|
||||||
|
meta = meta_map.get(docid, {})
|
||||||
|
img_url = meta.get('img_url')
|
||||||
|
img_tbn = meta.get('img_tbn')
|
||||||
|
|
||||||
|
# Fall back to the inline src if we failed to map the docid
|
||||||
|
if not img_tbn:
|
||||||
|
img_tag = card.find('img')
|
||||||
|
if img_tag:
|
||||||
|
candidate_src = img_tag.get('src')
|
||||||
|
if candidate_src and candidate_src.startswith('http'):
|
||||||
|
img_tbn = candidate_src
|
||||||
|
|
||||||
|
web_page = card.get('data-lpage') or ''
|
||||||
|
if not web_page:
|
||||||
|
link = card.find('a', href=True)
|
||||||
|
if link:
|
||||||
|
web_page = link['href']
|
||||||
|
|
||||||
|
key = (img_url, img_tbn, web_page)
|
||||||
|
if not any(key) or key in seen:
|
||||||
|
continue
|
||||||
|
seen.add(key)
|
||||||
|
|
||||||
|
parsed.append({
|
||||||
|
'domain': urlparse.urlparse(web_page).netloc
|
||||||
|
if web_page else '',
|
||||||
|
'img_url': img_url or img_tbn or '',
|
||||||
|
'web_page': web_page,
|
||||||
|
'img_tbn': img_tbn or img_url or ''
|
||||||
|
})
|
||||||
|
return parsed
|
||||||
|
|
||||||
|
# Try parsing the modern (udm=2) layout first
|
||||||
|
modern_results = _parse_modern_results(soup)
|
||||||
|
if modern_results:
|
||||||
|
# TODO: Implement proper image pagination. Google images uses
|
||||||
|
# infinite scroll with `ijn` offsets; we need a clean,
|
||||||
|
# de-duplicated pagination strategy before exposing a Next link.
|
||||||
|
next_link = None
|
||||||
|
return BeautifulSoup(
|
||||||
|
render_template(
|
||||||
|
'imageresults.html',
|
||||||
|
length=len(modern_results),
|
||||||
|
results=modern_results,
|
||||||
|
view_label="View Image",
|
||||||
|
next_link=next_link
|
||||||
|
),
|
||||||
|
features='html.parser'
|
||||||
|
)
|
||||||
|
|
||||||
# get some tags that are unchanged between mobile and pc versions
|
# get some tags that are unchanged between mobile and pc versions
|
||||||
cor_suggested = soup.find_all('table', attrs={'class': "By0U9"})
|
cor_suggested = soup.find_all('table', attrs={'class': "By0U9"})
|
||||||
next_pages = soup.find('table', attrs={'class': "uZgmoc"})
|
next_pages = soup.find('table', attrs={'class': "uZgmoc"})
|
||||||
|
|
@ -762,7 +948,11 @@ class Filter:
|
||||||
results_all = results_div.find_all('div', attrs={'class': "lIMUZd"})
|
results_all = results_div.find_all('div', attrs={'class': "lIMUZd"})
|
||||||
|
|
||||||
for item in results_all:
|
for item in results_all:
|
||||||
urls = item.find('a')['href'].split('&imgrefurl=')
|
link = item.find('a', href=True)
|
||||||
|
if not link:
|
||||||
|
continue
|
||||||
|
|
||||||
|
urls = link['href'].split('&imgrefurl=')
|
||||||
|
|
||||||
# Skip urls that are not two-element lists
|
# Skip urls that are not two-element lists
|
||||||
if len(urls) != 2:
|
if len(urls) != 2:
|
||||||
|
|
@ -777,7 +967,16 @@ class Filter:
|
||||||
except IndexError:
|
except IndexError:
|
||||||
web_page = urlparse.unquote(urls[1])
|
web_page = urlparse.unquote(urls[1])
|
||||||
|
|
||||||
img_tbn = urlparse.unquote(item.find('a').find('img')['src'])
|
img_tag = link.find('img')
|
||||||
|
if not img_tag:
|
||||||
|
continue
|
||||||
|
|
||||||
|
img_tbn = urlparse.unquote(
|
||||||
|
img_tag.get('src') or img_tag.get('data-src', '')
|
||||||
|
)
|
||||||
|
|
||||||
|
if not img_tbn:
|
||||||
|
continue
|
||||||
|
|
||||||
results.append({
|
results.append({
|
||||||
'domain': urlparse.urlparse(web_page).netloc,
|
'domain': urlparse.urlparse(web_page).netloc,
|
||||||
|
|
@ -794,11 +993,18 @@ class Filter:
|
||||||
|
|
||||||
# replace correction suggested by google object if exists
|
# replace correction suggested by google object if exists
|
||||||
if len(cor_suggested):
|
if len(cor_suggested):
|
||||||
soup.find_all(
|
suggested_tables = soup.find_all(
|
||||||
'table',
|
'table',
|
||||||
attrs={'class': "By0U9"}
|
attrs={'class': "By0U9"}
|
||||||
)[0].replaceWith(cor_suggested[0])
|
)
|
||||||
# replace next page object at the bottom of the page
|
if suggested_tables:
|
||||||
soup.find_all('table',
|
suggested_tables[0].replaceWith(cor_suggested[0])
|
||||||
attrs={'class': "uZgmoc"})[0].replaceWith(next_pages)
|
|
||||||
|
# replace next page object at the bottom of the page, when present
|
||||||
|
next_page_tables = soup.find_all('table', attrs={'class': "uZgmoc"})
|
||||||
|
if next_pages and next_page_tables:
|
||||||
|
next_page_tables[0].replaceWith(next_pages)
|
||||||
|
|
||||||
|
# TODO: Reintroduce pagination for legacy image layout if needed.
|
||||||
|
|
||||||
return soup
|
return soup
|
||||||
|
|
|
||||||
|
|
@ -48,6 +48,8 @@ class Config:
|
||||||
self.show_user_agent = read_config_bool('WHOOGLE_CONFIG_SHOW_USER_AGENT')
|
self.show_user_agent = read_config_bool('WHOOGLE_CONFIG_SHOW_USER_AGENT')
|
||||||
|
|
||||||
# Add user agent related keys to safe_keys
|
# Add user agent related keys to safe_keys
|
||||||
|
# Note: CSE credentials (cse_api_key, cse_id) are intentionally NOT included
|
||||||
|
# in safe_keys for security - they should not be shareable via URL
|
||||||
self.safe_keys = [
|
self.safe_keys = [
|
||||||
'lang_search',
|
'lang_search',
|
||||||
'lang_interface',
|
'lang_interface',
|
||||||
|
|
@ -92,6 +94,11 @@ class Config:
|
||||||
self.preferences_encrypted = read_config_bool('WHOOGLE_CONFIG_PREFERENCES_ENCRYPTED')
|
self.preferences_encrypted = read_config_bool('WHOOGLE_CONFIG_PREFERENCES_ENCRYPTED')
|
||||||
self.preferences_key = os.getenv('WHOOGLE_CONFIG_PREFERENCES_KEY', '')
|
self.preferences_key = os.getenv('WHOOGLE_CONFIG_PREFERENCES_KEY', '')
|
||||||
|
|
||||||
|
# Google Custom Search Engine (CSE) BYOK settings
|
||||||
|
self.cse_api_key = os.getenv('WHOOGLE_CSE_API_KEY', '')
|
||||||
|
self.cse_id = os.getenv('WHOOGLE_CSE_ID', '')
|
||||||
|
self.use_cse = read_config_bool('WHOOGLE_USE_CSE')
|
||||||
|
|
||||||
self.accept_language = False
|
self.accept_language = False
|
||||||
|
|
||||||
# Skip setting custom config if there isn't one
|
# Skip setting custom config if there isn't one
|
||||||
|
|
@ -247,9 +254,34 @@ class Config:
|
||||||
return param_str
|
return param_str
|
||||||
|
|
||||||
def _get_fernet_key(self, password: str) -> bytes:
|
def _get_fernet_key(self, password: str) -> bytes:
|
||||||
hash_object = hashlib.md5(password.encode())
|
"""Derive a Fernet-compatible key from a password using PBKDF2.
|
||||||
key = urlsafe_b64encode(hash_object.hexdigest().encode())
|
|
||||||
return key
|
Note: This uses a static salt for simplicity. This is a breaking change
|
||||||
|
from the previous MD5-based implementation. Existing encrypted preferences
|
||||||
|
will need to be re-encrypted.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
password: The password to derive the key from
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
bytes: A URL-safe base64 encoded 32-byte key suitable for Fernet
|
||||||
|
"""
|
||||||
|
# Use a static salt derived from app context
|
||||||
|
# In a production system, you'd want to store per-user salts
|
||||||
|
salt = b'whoogle-preferences-salt-v2'
|
||||||
|
|
||||||
|
# Derive a 32-byte key using PBKDF2 with SHA256
|
||||||
|
# 100,000 iterations is a reasonable balance of security and performance
|
||||||
|
kdf_key = hashlib.pbkdf2_hmac(
|
||||||
|
'sha256',
|
||||||
|
password.encode('utf-8'),
|
||||||
|
salt,
|
||||||
|
100000,
|
||||||
|
dklen=32
|
||||||
|
)
|
||||||
|
|
||||||
|
# Fernet requires a URL-safe base64 encoded key
|
||||||
|
return urlsafe_b64encode(kdf_key)
|
||||||
|
|
||||||
def _encode_preferences(self) -> str:
|
def _encode_preferences(self) -> str:
|
||||||
preferences_json = json.dumps(self.get_attrs()).encode()
|
preferences_json = json.dumps(self.get_attrs()).encode()
|
||||||
|
|
|
||||||
|
|
@ -147,6 +147,10 @@ def gen_query(query, args, config) -> str:
|
||||||
# Pass along type of results (news, images, books, etc)
|
# Pass along type of results (news, images, books, etc)
|
||||||
if 'tbm' in args:
|
if 'tbm' in args:
|
||||||
param_dict['tbm'] = '&tbm=' + args.get('tbm')
|
param_dict['tbm'] = '&tbm=' + args.get('tbm')
|
||||||
|
# Google Images now expects the modern udm=2 layout; force it when
|
||||||
|
# requesting images to avoid redirects to the new AI/text layout.
|
||||||
|
if args.get('tbm') == 'isch' and 'udm' not in args:
|
||||||
|
param_dict['udm'] = '&udm=2'
|
||||||
|
|
||||||
# Get results page start value (10 per page, ie page 2 start val = 20)
|
# Get results page start value (10 per page, ie page 2 start val = 20)
|
||||||
if 'start' in args:
|
if 'start' in args:
|
||||||
|
|
@ -212,8 +216,11 @@ class Request:
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def __init__(self, normal_ua, root_path, config: Config, http_client=None):
|
def __init__(self, normal_ua, root_path, config: Config, http_client=None):
|
||||||
self.search_url = 'https://www.google.com/search?gbv=1&num=' + str(
|
self.search_url = 'https://www.google.com/search?gbv=1&q='
|
||||||
os.getenv('WHOOGLE_RESULTS_PER_PAGE', 10)) + '&q='
|
# Google Images rejects the lightweight gbv=1 interface. Use the
|
||||||
|
# modern udm=2 entrypoint specifically for image searches to avoid the
|
||||||
|
# "update your browser" interstitial.
|
||||||
|
self.image_search_url = 'https://www.google.com/search?udm=2&q='
|
||||||
# Optionally send heartbeat to Tor to determine availability
|
# Optionally send heartbeat to Tor to determine availability
|
||||||
# Only when Tor is enabled in config to avoid unnecessary socket usage
|
# Only when Tor is enabled in config to avoid unnecessary socket usage
|
||||||
if config.tor:
|
if config.tor:
|
||||||
|
|
@ -235,6 +242,13 @@ class Request:
|
||||||
if not self.mobile:
|
if not self.mobile:
|
||||||
self.modified_user_agent_mobile = gen_user_agent(config, True)
|
self.modified_user_agent_mobile = gen_user_agent(config, True)
|
||||||
|
|
||||||
|
# Dedicated modern UA to use when Google rejects legacy ones (e.g. Images)
|
||||||
|
self.image_user_agent = (
|
||||||
|
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
|
||||||
|
'AppleWebKit/537.36 (KHTML, like Gecko) '
|
||||||
|
'Chrome/127.0.0.0 Safari/537.36'
|
||||||
|
)
|
||||||
|
|
||||||
# Set up proxy configuration
|
# Set up proxy configuration
|
||||||
proxy_path = os.environ.get('WHOOGLE_PROXY_LOC', '')
|
proxy_path = os.environ.get('WHOOGLE_PROXY_LOC', '')
|
||||||
if proxy_path:
|
if proxy_path:
|
||||||
|
|
@ -332,6 +346,13 @@ class Request:
|
||||||
else:
|
else:
|
||||||
modified_user_agent = self.modified_user_agent
|
modified_user_agent = self.modified_user_agent
|
||||||
|
|
||||||
|
# Some Google endpoints (notably Images) now refuse legacy user agents.
|
||||||
|
# If an image search is detected and the generated UA isn't Chromium-
|
||||||
|
# like, retry with a modern Chrome string to avoid the "update your
|
||||||
|
# browser" interstitial.
|
||||||
|
if (('tbm=isch' in query) or ('udm=2' in query)) and 'Chrome' not in modified_user_agent:
|
||||||
|
modified_user_agent = self.image_user_agent
|
||||||
|
|
||||||
headers = {
|
headers = {
|
||||||
'User-Agent': modified_user_agent,
|
'User-Agent': modified_user_agent,
|
||||||
'Accept': ('text/html,application/xhtml+xml,application/xml;'
|
'Accept': ('text/html,application/xhtml+xml,application/xml;'
|
||||||
|
|
@ -345,16 +366,23 @@ class Request:
|
||||||
'Sec-Fetch-Site': 'none',
|
'Sec-Fetch-Site': 'none',
|
||||||
'Sec-Fetch-Mode': 'navigate',
|
'Sec-Fetch-Mode': 'navigate',
|
||||||
'Sec-Fetch-User': '?1',
|
'Sec-Fetch-User': '?1',
|
||||||
'Sec-Fetch-Dest': 'document',
|
'Sec-Fetch-Dest': 'document'
|
||||||
'Sec-CH-UA': (
|
|
||||||
'"Not/A)Brand";v="8", '
|
|
||||||
'"Chromium";v="127", '
|
|
||||||
'"Google Chrome";v="127"'
|
|
||||||
),
|
|
||||||
'Sec-CH-UA-Mobile': '?0',
|
|
||||||
'Sec-CH-UA-Platform': '"macOS"'
|
|
||||||
}
|
}
|
||||||
|
# Only attach client hints when using a Chromium-like user agent to
|
||||||
|
# avoid sending conflicting information that can trigger unsupported
|
||||||
|
# browser pages.
|
||||||
|
if 'Chrome' in headers['User-Agent']:
|
||||||
|
headers.update({
|
||||||
|
'Sec-CH-UA': (
|
||||||
|
'"Not/A)Brand";v="8", '
|
||||||
|
'"Chromium";v="127", '
|
||||||
|
'"Google Chrome";v="127"'
|
||||||
|
),
|
||||||
|
'Sec-CH-UA-Mobile': '?0',
|
||||||
|
'Sec-CH-UA-Platform': '"Windows"'
|
||||||
|
})
|
||||||
|
|
||||||
|
|
||||||
# Add Accept-Language header tied to the current config if requested
|
# Add Accept-Language header tied to the current config if requested
|
||||||
if self.lang_interface:
|
if self.lang_interface:
|
||||||
headers['Accept-Language'] = (
|
headers['Accept-Language'] = (
|
||||||
|
|
@ -393,9 +421,13 @@ class Request:
|
||||||
"Error raised during Tor connection validation",
|
"Error raised during Tor connection validation",
|
||||||
disable=True)
|
disable=True)
|
||||||
|
|
||||||
|
search_base = base_url or self.search_url
|
||||||
|
if not base_url and ('tbm=isch' in query or 'udm=2' in query):
|
||||||
|
search_base = self.image_search_url
|
||||||
|
|
||||||
try:
|
try:
|
||||||
response = self.http_client.get(
|
response = self.http_client.get(
|
||||||
(base_url or self.search_url) + query,
|
search_base + query,
|
||||||
headers=headers,
|
headers=headers,
|
||||||
cookies=consent_cookies)
|
cookies=consent_cookies)
|
||||||
except httpx.HTTPError as e:
|
except httpx.HTTPError as e:
|
||||||
|
|
@ -406,6 +438,6 @@ class Request:
|
||||||
attempt += 1
|
attempt += 1
|
||||||
if attempt > 10:
|
if attempt > 10:
|
||||||
raise TorError("Tor query failed -- max attempts exceeded 10")
|
raise TorError("Tor query failed -- max attempts exceeded 10")
|
||||||
return self.send((base_url or self.search_url), query, attempt)
|
return self.send(search_base, query, attempt)
|
||||||
|
|
||||||
return response
|
return response
|
||||||
|
|
|
||||||
108
app/routes.py
108
app/routes.py
|
|
@ -3,7 +3,6 @@ import base64
|
||||||
import io
|
import io
|
||||||
import json
|
import json
|
||||||
import os
|
import os
|
||||||
import pickle
|
|
||||||
import re
|
import re
|
||||||
import urllib.parse as urlparse
|
import urllib.parse as urlparse
|
||||||
import uuid
|
import uuid
|
||||||
|
|
@ -18,6 +17,7 @@ from app import app
|
||||||
from app.models.config import Config
|
from app.models.config import Config
|
||||||
from app.models.endpoint import Endpoint
|
from app.models.endpoint import Endpoint
|
||||||
from app.request import Request, TorError
|
from app.request import Request, TorError
|
||||||
|
from app.services.cse_client import CSEException
|
||||||
from app.utils.bangs import suggest_bang, resolve_bang
|
from app.utils.bangs import suggest_bang, resolve_bang
|
||||||
from app.utils.misc import empty_gif, placeholder_img, get_proxy_host_url, \
|
from app.utils.misc import empty_gif, placeholder_img, get_proxy_host_url, \
|
||||||
fetch_favicon
|
fetch_favicon
|
||||||
|
|
@ -102,9 +102,8 @@ def session_required(f):
|
||||||
if os.path.getsize(file_path) > app.config['MAX_SESSION_SIZE']:
|
if os.path.getsize(file_path) > app.config['MAX_SESSION_SIZE']:
|
||||||
continue
|
continue
|
||||||
|
|
||||||
with open(file_path, 'rb') as session_file:
|
with open(file_path, 'r', encoding='utf-8') as session_file:
|
||||||
_ = pickle.load(session_file)
|
data = json.load(session_file)
|
||||||
data = pickle.load(session_file)
|
|
||||||
if isinstance(data, dict) and 'valid' in data:
|
if isinstance(data, dict) and 'valid' in data:
|
||||||
continue
|
continue
|
||||||
invalid_sessions.append(file_path)
|
invalid_sessions.append(file_path)
|
||||||
|
|
@ -176,19 +175,28 @@ def after_request_func(resp):
|
||||||
resp.headers['X-Content-Type-Options'] = 'nosniff'
|
resp.headers['X-Content-Type-Options'] = 'nosniff'
|
||||||
resp.headers['X-Frame-Options'] = 'DENY'
|
resp.headers['X-Frame-Options'] = 'DENY'
|
||||||
resp.headers['Cache-Control'] = 'max-age=86400'
|
resp.headers['Cache-Control'] = 'max-age=86400'
|
||||||
|
|
||||||
|
# Security headers
|
||||||
|
resp.headers['Referrer-Policy'] = 'no-referrer'
|
||||||
|
resp.headers['Permissions-Policy'] = 'geolocation=(), microphone=(), camera=()'
|
||||||
|
|
||||||
|
# Add HSTS header if HTTPS is enabled
|
||||||
|
if os.environ.get('HTTPS_ONLY', False):
|
||||||
|
resp.headers['Strict-Transport-Security'] = 'max-age=31536000; includeSubDomains'
|
||||||
|
|
||||||
if os.getenv('WHOOGLE_CSP', False):
|
# Enable CSP by default (can be disabled via env var)
|
||||||
|
if os.getenv('WHOOGLE_CSP', '1') != '0':
|
||||||
resp.headers['Content-Security-Policy'] = app.config['CSP']
|
resp.headers['Content-Security-Policy'] = app.config['CSP']
|
||||||
if os.environ.get('HTTPS_ONLY', False):
|
if os.environ.get('HTTPS_ONLY', False):
|
||||||
resp.headers['Content-Security-Policy'] += \
|
resp.headers['Content-Security-Policy'] += \
|
||||||
'upgrade-insecure-requests'
|
' upgrade-insecure-requests'
|
||||||
|
|
||||||
return resp
|
return resp
|
||||||
|
|
||||||
|
|
||||||
@app.errorhandler(404)
|
@app.errorhandler(404)
|
||||||
def unknown_page(e):
|
def unknown_page(e):
|
||||||
app.logger.warn(e)
|
app.logger.warning(e)
|
||||||
return redirect(g.app_location)
|
return redirect(g.app_location)
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -349,6 +357,30 @@ def search():
|
||||||
session['config']['tor'] = False if e.disable else session['config'][
|
session['config']['tor'] = False if e.disable else session['config'][
|
||||||
'tor']
|
'tor']
|
||||||
return redirect(url_for('.index'))
|
return redirect(url_for('.index'))
|
||||||
|
except CSEException as e:
|
||||||
|
localization_lang = g.user_config.get_localization_lang()
|
||||||
|
translation = app.config['TRANSLATIONS'][localization_lang]
|
||||||
|
wants_json = (
|
||||||
|
request.args.get('format') == 'json' or
|
||||||
|
'application/json' in request.headers.get('Accept', '') or
|
||||||
|
'application/*+json' in request.headers.get('Accept', '')
|
||||||
|
)
|
||||||
|
error_msg = f"Custom Search API Error: {e.message}"
|
||||||
|
if e.is_quota_error:
|
||||||
|
error_msg = ("Google Custom Search API quota exceeded. "
|
||||||
|
"Free tier allows 100 queries/day. "
|
||||||
|
"Wait until midnight PT or disable CSE in settings.")
|
||||||
|
if wants_json:
|
||||||
|
return jsonify({
|
||||||
|
'error': True,
|
||||||
|
'error_message': error_msg,
|
||||||
|
'query': urlparse.unquote(query)
|
||||||
|
}), e.code
|
||||||
|
return render_template(
|
||||||
|
'error.html',
|
||||||
|
error_message=error_msg,
|
||||||
|
translation=translation,
|
||||||
|
config=g.user_config), e.code
|
||||||
|
|
||||||
wants_json = (
|
wants_json = (
|
||||||
request.args.get('format') == 'json' or
|
request.args.get('format') == 'json' or
|
||||||
|
|
@ -417,6 +449,16 @@ def search():
|
||||||
search_util.search_type,
|
search_util.search_type,
|
||||||
g.user_config.preferences,
|
g.user_config.preferences,
|
||||||
translation)
|
translation)
|
||||||
|
|
||||||
|
# Filter out unsupported tabs when CSE is enabled
|
||||||
|
# CSE only supports web (all) and image search, not videos/news
|
||||||
|
use_cse = (
|
||||||
|
g.user_config.use_cse and
|
||||||
|
g.user_config.cse_api_key and
|
||||||
|
g.user_config.cse_id
|
||||||
|
)
|
||||||
|
if use_cse:
|
||||||
|
tabs = {k: v for k, v in tabs.items() if k in ['all', 'images', 'maps']}
|
||||||
|
|
||||||
# Feature to display currency_card
|
# Feature to display currency_card
|
||||||
# Since this is determined by more than just the
|
# Since this is determined by more than just the
|
||||||
|
|
@ -604,10 +646,11 @@ def config():
|
||||||
return json.dumps(g.user_config.__dict__)
|
return json.dumps(g.user_config.__dict__)
|
||||||
elif request.method == 'PUT' and not config_disabled:
|
elif request.method == 'PUT' and not config_disabled:
|
||||||
if name:
|
if name:
|
||||||
config_pkl = os.path.join(app.config['CONFIG_PATH'], name)
|
config_file = os.path.join(app.config['CONFIG_PATH'], name)
|
||||||
session['config'] = (pickle.load(open(config_pkl, 'rb'))
|
if os.path.exists(config_file):
|
||||||
if os.path.exists(config_pkl)
|
with open(config_file, 'r', encoding='utf-8') as f:
|
||||||
else session['config'])
|
session['config'] = json.load(f)
|
||||||
|
# else keep existing session['config']
|
||||||
return json.dumps(session['config'])
|
return json.dumps(session['config'])
|
||||||
else:
|
else:
|
||||||
return json.dumps({})
|
return json.dumps({})
|
||||||
|
|
@ -623,7 +666,7 @@ def config():
|
||||||
# Keep both the selection and the custom string
|
# Keep both the selection and the custom string
|
||||||
if 'custom_user_agent' in config_data:
|
if 'custom_user_agent' in config_data:
|
||||||
config_data['custom_user_agent'] = config_data['custom_user_agent']
|
config_data['custom_user_agent'] = config_data['custom_user_agent']
|
||||||
print(f"Setting custom user agent to: {config_data['custom_user_agent']}") # Debug log
|
app.logger.debug(f"Setting custom user agent to: {config_data['custom_user_agent']}")
|
||||||
else:
|
else:
|
||||||
config_data['use_custom_user_agent'] = False
|
config_data['use_custom_user_agent'] = False
|
||||||
# Only clear custom_user_agent if not using custom option
|
# Only clear custom_user_agent if not using custom option
|
||||||
|
|
@ -632,11 +675,9 @@ def config():
|
||||||
|
|
||||||
# Save config by name to allow a user to easily load later
|
# Save config by name to allow a user to easily load later
|
||||||
if name:
|
if name:
|
||||||
pickle.dump(
|
config_file = os.path.join(app.config['CONFIG_PATH'], name)
|
||||||
config_data,
|
with open(config_file, 'w', encoding='utf-8') as f:
|
||||||
open(os.path.join(
|
json.dump(config_data, f, indent=2)
|
||||||
app.config['CONFIG_PATH'],
|
|
||||||
name), 'wb'))
|
|
||||||
|
|
||||||
session['config'] = config_data
|
session['config'] = config_data
|
||||||
return redirect(config_data['url'])
|
return redirect(config_data['url'])
|
||||||
|
|
@ -798,8 +839,9 @@ def internal_error(e):
|
||||||
|
|
||||||
# Attempt to parse the query
|
# Attempt to parse the query
|
||||||
try:
|
try:
|
||||||
search_util = Search(request, g.user_config, g.session_key)
|
if hasattr(g, 'user_config') and hasattr(g, 'session_key'):
|
||||||
query = search_util.new_search_query()
|
search_util = Search(request, g.user_config, g.session_key)
|
||||||
|
query = search_util.new_search_query()
|
||||||
except Exception:
|
except Exception:
|
||||||
pass
|
pass
|
||||||
|
|
||||||
|
|
@ -809,16 +851,26 @@ def internal_error(e):
|
||||||
if (fallback_engine):
|
if (fallback_engine):
|
||||||
return redirect(fallback_engine + (query or ''))
|
return redirect(fallback_engine + (query or ''))
|
||||||
|
|
||||||
localization_lang = g.user_config.get_localization_lang()
|
# Safely get localization language with fallback
|
||||||
|
if hasattr(g, 'user_config'):
|
||||||
|
localization_lang = g.user_config.get_localization_lang()
|
||||||
|
else:
|
||||||
|
localization_lang = 'lang_en'
|
||||||
translation = app.config['TRANSLATIONS'][localization_lang]
|
translation = app.config['TRANSLATIONS'][localization_lang]
|
||||||
return render_template(
|
# Build template context with safe defaults
|
||||||
'error.html',
|
template_context = {
|
||||||
error_message='Internal server error (500)',
|
'error_message': 'Internal server error (500)',
|
||||||
translation=translation,
|
'translation': translation,
|
||||||
farside='https://farside.link',
|
'farside': 'https://farside.link',
|
||||||
config=g.user_config,
|
'query': urlparse.unquote(query or '')
|
||||||
query=urlparse.unquote(query or ''),
|
}
|
||||||
params=g.user_config.to_params(keys=['preferences'])), 500
|
|
||||||
|
# Add user config if available
|
||||||
|
if hasattr(g, 'user_config'):
|
||||||
|
template_context['config'] = g.user_config
|
||||||
|
template_context['params'] = g.user_config.to_params(keys=['preferences'])
|
||||||
|
|
||||||
|
return render_template('error.html', **template_context), 500
|
||||||
|
|
||||||
|
|
||||||
def run_app() -> None:
|
def run_app() -> None:
|
||||||
|
|
|
||||||
452
app/services/cse_client.py
Normal file
452
app/services/cse_client.py
Normal file
|
|
@ -0,0 +1,452 @@
|
||||||
|
"""Google Custom Search Engine (CSE) API Client
|
||||||
|
|
||||||
|
This module provides a client for Google's Custom Search JSON API,
|
||||||
|
allowing users to bring their own API key (BYOK) for search functionality.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
from typing import Optional
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from urllib.parse import urlparse
|
||||||
|
|
||||||
|
from flask import render_template
|
||||||
|
|
||||||
|
|
||||||
|
# Google Custom Search API endpoint
|
||||||
|
CSE_API_URL = 'https://www.googleapis.com/customsearch/v1'
|
||||||
|
|
||||||
|
|
||||||
|
class CSEException(Exception):
|
||||||
|
"""Exception raised for CSE API errors"""
|
||||||
|
def __init__(self, message: str, code: int = 500, is_quota_error: bool = False):
|
||||||
|
self.message = message
|
||||||
|
self.code = code
|
||||||
|
self.is_quota_error = is_quota_error
|
||||||
|
super().__init__(self.message)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class CSEError:
|
||||||
|
"""Represents an error from the CSE API"""
|
||||||
|
code: int
|
||||||
|
message: str
|
||||||
|
|
||||||
|
@property
|
||||||
|
def is_quota_exceeded(self) -> bool:
|
||||||
|
return self.code == 429 or 'quota' in self.message.lower()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def is_invalid_key(self) -> bool:
|
||||||
|
return self.code == 400 or 'invalid' in self.message.lower()
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class CSEResult:
|
||||||
|
"""Represents a single search result from CSE API"""
|
||||||
|
title: str
|
||||||
|
link: str
|
||||||
|
snippet: str
|
||||||
|
display_link: str
|
||||||
|
html_title: Optional[str] = None
|
||||||
|
html_snippet: Optional[str] = None
|
||||||
|
# Image-specific fields (populated for image search)
|
||||||
|
image_url: Optional[str] = None
|
||||||
|
thumbnail_url: Optional[str] = None
|
||||||
|
image_width: Optional[int] = None
|
||||||
|
image_height: Optional[int] = None
|
||||||
|
context_link: Optional[str] = None # Page where image was found
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class CSEResponse:
|
||||||
|
"""Represents a complete CSE API response"""
|
||||||
|
results: list[CSEResult]
|
||||||
|
total_results: str
|
||||||
|
search_time: float
|
||||||
|
query: str
|
||||||
|
start_index: int
|
||||||
|
is_image_search: bool = False
|
||||||
|
error: Optional[CSEError] = None
|
||||||
|
|
||||||
|
@property
|
||||||
|
def has_error(self) -> bool:
|
||||||
|
return self.error is not None
|
||||||
|
|
||||||
|
@property
|
||||||
|
def has_results(self) -> bool:
|
||||||
|
return len(self.results) > 0
|
||||||
|
|
||||||
|
|
||||||
|
class CSEClient:
|
||||||
|
"""Client for Google Custom Search Engine API
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
client = CSEClient(api_key='your-key', cse_id='your-cse-id')
|
||||||
|
response = client.search('python programming')
|
||||||
|
|
||||||
|
if response.has_error:
|
||||||
|
print(f"Error: {response.error.message}")
|
||||||
|
else:
|
||||||
|
for result in response.results:
|
||||||
|
print(f"{result.title}: {result.link}")
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, api_key: str, cse_id: str, timeout: float = 10.0):
|
||||||
|
"""Initialize CSE client
|
||||||
|
|
||||||
|
Args:
|
||||||
|
api_key: Google API key with Custom Search API enabled
|
||||||
|
cse_id: Custom Search Engine ID (cx parameter)
|
||||||
|
timeout: Request timeout in seconds
|
||||||
|
"""
|
||||||
|
self.api_key = api_key
|
||||||
|
self.cse_id = cse_id
|
||||||
|
self.timeout = timeout
|
||||||
|
self._client = httpx.Client(timeout=timeout)
|
||||||
|
|
||||||
|
def search(
|
||||||
|
self,
|
||||||
|
query: str,
|
||||||
|
start: int = 1,
|
||||||
|
num: int = 10,
|
||||||
|
safe: str = 'off',
|
||||||
|
language: str = '',
|
||||||
|
country: str = '',
|
||||||
|
search_type: str = ''
|
||||||
|
) -> CSEResponse:
|
||||||
|
"""Execute a search query against the CSE API
|
||||||
|
|
||||||
|
Args:
|
||||||
|
query: Search query string
|
||||||
|
start: Starting result index (1-based, for pagination)
|
||||||
|
num: Number of results to return (max 10)
|
||||||
|
safe: Safe search setting ('off', 'medium', 'high')
|
||||||
|
language: Language restriction (e.g., 'lang_en')
|
||||||
|
country: Country restriction (e.g., 'countryUS')
|
||||||
|
search_type: Type of search ('image' for image search, '' for web)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
CSEResponse with results or error information
|
||||||
|
"""
|
||||||
|
params = {
|
||||||
|
'key': self.api_key,
|
||||||
|
'cx': self.cse_id,
|
||||||
|
'q': query,
|
||||||
|
'start': start,
|
||||||
|
'num': min(num, 10), # API max is 10
|
||||||
|
'safe': safe,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Add search type for image search
|
||||||
|
if search_type == 'image':
|
||||||
|
params['searchType'] = 'image'
|
||||||
|
|
||||||
|
# Add optional parameters
|
||||||
|
if language:
|
||||||
|
# CSE uses 'lr' for language restrict
|
||||||
|
params['lr'] = language
|
||||||
|
if country:
|
||||||
|
# CSE uses 'cr' for country restrict
|
||||||
|
params['cr'] = country
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = self._client.get(CSE_API_URL, params=params)
|
||||||
|
data = response.json()
|
||||||
|
|
||||||
|
# Check for API errors
|
||||||
|
if 'error' in data:
|
||||||
|
error_info = data['error']
|
||||||
|
return CSEResponse(
|
||||||
|
results=[],
|
||||||
|
total_results='0',
|
||||||
|
search_time=0.0,
|
||||||
|
query=query,
|
||||||
|
start_index=start,
|
||||||
|
error=CSEError(
|
||||||
|
code=error_info.get('code', 500),
|
||||||
|
message=error_info.get('message', 'Unknown error')
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
# Parse successful response
|
||||||
|
search_info = data.get('searchInformation', {})
|
||||||
|
items = data.get('items', [])
|
||||||
|
is_image = search_type == 'image'
|
||||||
|
|
||||||
|
results = []
|
||||||
|
for item in items:
|
||||||
|
# Extract image-specific data if present
|
||||||
|
image_data = item.get('image', {})
|
||||||
|
|
||||||
|
results.append(CSEResult(
|
||||||
|
title=item.get('title', ''),
|
||||||
|
link=item.get('link', ''),
|
||||||
|
snippet=item.get('snippet', ''),
|
||||||
|
display_link=item.get('displayLink', ''),
|
||||||
|
html_title=item.get('htmlTitle'),
|
||||||
|
html_snippet=item.get('htmlSnippet'),
|
||||||
|
# Image fields
|
||||||
|
image_url=item.get('link') if is_image else None,
|
||||||
|
thumbnail_url=image_data.get('thumbnailLink'),
|
||||||
|
image_width=image_data.get('width'),
|
||||||
|
image_height=image_data.get('height'),
|
||||||
|
context_link=image_data.get('contextLink')
|
||||||
|
))
|
||||||
|
|
||||||
|
return CSEResponse(
|
||||||
|
results=results,
|
||||||
|
total_results=search_info.get('totalResults', '0'),
|
||||||
|
search_time=float(search_info.get('searchTime', 0)),
|
||||||
|
query=query,
|
||||||
|
start_index=start,
|
||||||
|
is_image_search=is_image
|
||||||
|
)
|
||||||
|
|
||||||
|
except httpx.TimeoutException:
|
||||||
|
return CSEResponse(
|
||||||
|
results=[],
|
||||||
|
total_results='0',
|
||||||
|
search_time=0.0,
|
||||||
|
query=query,
|
||||||
|
start_index=start,
|
||||||
|
error=CSEError(code=408, message='Request timed out')
|
||||||
|
)
|
||||||
|
except httpx.RequestError as e:
|
||||||
|
return CSEResponse(
|
||||||
|
results=[],
|
||||||
|
total_results='0',
|
||||||
|
search_time=0.0,
|
||||||
|
query=query,
|
||||||
|
start_index=start,
|
||||||
|
error=CSEError(code=500, message=f'Request failed: {str(e)}')
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
return CSEResponse(
|
||||||
|
results=[],
|
||||||
|
total_results='0',
|
||||||
|
search_time=0.0,
|
||||||
|
query=query,
|
||||||
|
start_index=start,
|
||||||
|
error=CSEError(code=500, message=f'Unexpected error: {str(e)}')
|
||||||
|
)
|
||||||
|
|
||||||
|
def close(self):
|
||||||
|
"""Close the HTTP client"""
|
||||||
|
self._client.close()
|
||||||
|
|
||||||
|
def __enter__(self):
|
||||||
|
return self
|
||||||
|
|
||||||
|
def __exit__(self, *args):
|
||||||
|
self.close()
|
||||||
|
|
||||||
|
|
||||||
|
def cse_results_to_html(response: CSEResponse, query: str) -> str:
|
||||||
|
"""Convert CSE API response to HTML matching Whoogle's result format
|
||||||
|
|
||||||
|
This generates HTML that mimics the structure expected by Whoogle's
|
||||||
|
existing filter and result processing pipeline.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
response: CSEResponse from the API
|
||||||
|
query: Original search query
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
HTML string formatted like Google search results
|
||||||
|
"""
|
||||||
|
if response.has_error:
|
||||||
|
error = response.error
|
||||||
|
if error.is_quota_exceeded:
|
||||||
|
return _error_html(
|
||||||
|
'API Quota Exceeded',
|
||||||
|
'Your Google Custom Search API quota has been exceeded. '
|
||||||
|
'Free tier allows 100 queries/day. Wait until midnight PT '
|
||||||
|
'or enable billing in Google Cloud Console.'
|
||||||
|
)
|
||||||
|
elif error.is_invalid_key:
|
||||||
|
return _error_html(
|
||||||
|
'Invalid API Key',
|
||||||
|
'Your Google Custom Search API key is invalid. '
|
||||||
|
'Please check your API key and CSE ID in settings.'
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
return _error_html('Search Error', error.message)
|
||||||
|
|
||||||
|
if not response.has_results:
|
||||||
|
return _no_results_html(query)
|
||||||
|
|
||||||
|
# Use different HTML structure for image vs web results
|
||||||
|
if response.is_image_search:
|
||||||
|
return _image_results_html(response, query)
|
||||||
|
|
||||||
|
# Build HTML results matching Whoogle's expected structure
|
||||||
|
results_html = []
|
||||||
|
|
||||||
|
for result in response.results:
|
||||||
|
# Escape HTML in content
|
||||||
|
title = _escape_html(result.title)
|
||||||
|
snippet = _escape_html(result.snippet)
|
||||||
|
link = result.link
|
||||||
|
display_link = _escape_html(result.display_link)
|
||||||
|
|
||||||
|
# Use HTML versions if available (they have bold tags for query terms)
|
||||||
|
if result.html_title:
|
||||||
|
title = result.html_title
|
||||||
|
if result.html_snippet:
|
||||||
|
snippet = result.html_snippet
|
||||||
|
|
||||||
|
# Match the structure used by Google/mock results
|
||||||
|
result_html = f'''
|
||||||
|
<div class="ZINbbc xpd O9g5cc uUPGi">
|
||||||
|
<div class="kCrYT">
|
||||||
|
<a href="{link}">
|
||||||
|
<h3 class="BNeawe vvjwJb AP7Wnd">{title}</h3>
|
||||||
|
<div class="BNeawe UPmit AP7Wnd luh4tb" style="color: var(--whoogle-result-url);">{display_link}</div>
|
||||||
|
</a>
|
||||||
|
</div>
|
||||||
|
<div class="kCrYT">
|
||||||
|
<div class="BNeawe s3v9rd AP7Wnd">
|
||||||
|
<span class="VwiC3b">{snippet}</span>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
'''
|
||||||
|
results_html.append(result_html)
|
||||||
|
|
||||||
|
# Build pagination if needed
|
||||||
|
pagination_html = ''
|
||||||
|
if int(response.total_results) > 10:
|
||||||
|
pagination_html = _pagination_html(response.start_index, response.query)
|
||||||
|
|
||||||
|
# Wrap in expected structure
|
||||||
|
# Add data-cse attribute to prevent collapse_sections from collapsing these results
|
||||||
|
return f'''
|
||||||
|
<html>
|
||||||
|
<body>
|
||||||
|
<div id="main" data-cse="true">
|
||||||
|
<div id="cnt">
|
||||||
|
<div id="rcnt">
|
||||||
|
<div id="center_col">
|
||||||
|
<div id="res">
|
||||||
|
<div id="search">
|
||||||
|
<div id="rso">
|
||||||
|
{''.join(results_html)}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
{pagination_html}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
|
'''
|
||||||
|
|
||||||
|
|
||||||
|
def _escape_html(text: str) -> str:
|
||||||
|
"""Escape HTML special characters"""
|
||||||
|
if not text:
|
||||||
|
return ''
|
||||||
|
return (text
|
||||||
|
.replace('&', '&')
|
||||||
|
.replace('<', '<')
|
||||||
|
.replace('>', '>')
|
||||||
|
.replace('"', '"')
|
||||||
|
.replace("'", '''))
|
||||||
|
|
||||||
|
|
||||||
|
def _error_html(title: str, message: str) -> str:
|
||||||
|
"""Generate error HTML"""
|
||||||
|
return f'''
|
||||||
|
<html>
|
||||||
|
<body>
|
||||||
|
<div id="main">
|
||||||
|
<div style="padding: 20px; text-align: center;">
|
||||||
|
<h2 style="color: #d93025;">{_escape_html(title)}</h2>
|
||||||
|
<p>{_escape_html(message)}</p>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
|
'''
|
||||||
|
|
||||||
|
|
||||||
|
def _no_results_html(query: str) -> str:
|
||||||
|
"""Generate no results HTML"""
|
||||||
|
return f'''
|
||||||
|
<html>
|
||||||
|
<body>
|
||||||
|
<div id="main">
|
||||||
|
<div style="padding: 20px;">
|
||||||
|
<p>No results found for <b>{_escape_html(query)}</b></p>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
|
'''
|
||||||
|
|
||||||
|
|
||||||
|
def _image_results_html(response: CSEResponse, query: str) -> str:
|
||||||
|
"""Generate HTML for image search results using the imageresults template
|
||||||
|
|
||||||
|
Args:
|
||||||
|
response: CSEResponse with image results
|
||||||
|
query: Original search query
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
HTML string formatted for image results display
|
||||||
|
"""
|
||||||
|
# Convert CSE results to the format expected by imageresults.html template
|
||||||
|
results = []
|
||||||
|
for result in response.results:
|
||||||
|
image_url = result.image_url or result.link
|
||||||
|
thumbnail_url = result.thumbnail_url or image_url
|
||||||
|
web_page = result.context_link or result.link
|
||||||
|
domain = urlparse(web_page).netloc if web_page else result.display_link
|
||||||
|
|
||||||
|
results.append({
|
||||||
|
'domain': domain,
|
||||||
|
'img_url': image_url,
|
||||||
|
'web_page': web_page,
|
||||||
|
'img_tbn': thumbnail_url
|
||||||
|
})
|
||||||
|
|
||||||
|
# Build pagination link if needed
|
||||||
|
next_link = None
|
||||||
|
if int(response.total_results) > response.start_index + len(response.results) - 1:
|
||||||
|
next_start = response.start_index + 10
|
||||||
|
next_link = f'search?q={query}&tbm=isch&start={next_start}'
|
||||||
|
|
||||||
|
# Use the same template as regular image results
|
||||||
|
return render_template(
|
||||||
|
'imageresults.html',
|
||||||
|
length=len(results),
|
||||||
|
results=results,
|
||||||
|
view_label="View Image",
|
||||||
|
next_link=next_link
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _pagination_html(current_start: int, query: str) -> str:
|
||||||
|
"""Generate pagination links"""
|
||||||
|
# CSE API uses 1-based indexing, 10 results per page
|
||||||
|
current_page = (current_start - 1) // 10 + 1
|
||||||
|
|
||||||
|
prev_link = ''
|
||||||
|
next_link = ''
|
||||||
|
|
||||||
|
if current_page > 1:
|
||||||
|
prev_start = (current_page - 2) * 10 + 1
|
||||||
|
prev_link = f'<a href="search?q={query}&start={prev_start}">Previous</a>'
|
||||||
|
|
||||||
|
next_start = current_page * 10 + 1
|
||||||
|
next_link = f'<a href="search?q={query}&start={next_start}">Next</a>'
|
||||||
|
|
||||||
|
return f'''
|
||||||
|
<div id="foot" style="text-align: center; padding: 20px;">
|
||||||
|
{prev_link}
|
||||||
|
<span style="margin: 0 20px;">Page {current_page}</span>
|
||||||
|
{next_link}
|
||||||
|
</div>
|
||||||
|
'''
|
||||||
|
|
@ -193,10 +193,13 @@ const calc = () => {
|
||||||
(statement.match(/\(/g) || []).length >
|
(statement.match(/\(/g) || []).length >
|
||||||
(statement.match(/\)/g) || []).length
|
(statement.match(/\)/g) || []).length
|
||||||
) statement += ")"; else break;
|
) statement += ")"; else break;
|
||||||
// evaluate the expression.
|
// evaluate the expression using a safe evaluator (no eval())
|
||||||
console.log("calculating [" + statement + "]");
|
console.log("calculating [" + statement + "]");
|
||||||
try {
|
try {
|
||||||
var result = eval(statement);
|
// Safe evaluation: create a sandboxed function with only Math object available
|
||||||
|
// This prevents arbitrary code execution while allowing mathematical operations
|
||||||
|
const safeEval = new Function('Math', `'use strict'; return (${statement})`);
|
||||||
|
var result = safeEval(Math);
|
||||||
document.getElementById("prev-equation").innerHTML = mathtext.innerHTML + " = ";
|
document.getElementById("prev-equation").innerHTML = mathtext.innerHTML + " = ";
|
||||||
mathtext.innerHTML = result;
|
mathtext.innerHTML = result;
|
||||||
mathtext.classList.remove("error-border");
|
mathtext.classList.remove("error-border");
|
||||||
|
|
|
||||||
|
|
@ -10,9 +10,9 @@
|
||||||
background-color: #fff;
|
background-color: #fff;
|
||||||
}
|
}
|
||||||
body {
|
body {
|
||||||
padding: 0 8px;
|
padding: 0 12px;
|
||||||
margin: 0 auto;
|
margin: 0 auto;
|
||||||
max-width: 736px;
|
max-width: 1200px;
|
||||||
}
|
}
|
||||||
a {
|
a {
|
||||||
text-decoration: none;
|
text-decoration: none;
|
||||||
|
|
@ -167,6 +167,7 @@
|
||||||
border-collapse: collapse;
|
border-collapse: collapse;
|
||||||
border-spacing: 0;
|
border-spacing: 0;
|
||||||
width: 100%;
|
width: 100%;
|
||||||
|
table-layout: fixed;
|
||||||
}
|
}
|
||||||
.X6ZCif {
|
.X6ZCif {
|
||||||
color: #202124;
|
color: #202124;
|
||||||
|
|
@ -209,15 +210,20 @@
|
||||||
text-align: center;
|
text-align: center;
|
||||||
}
|
}
|
||||||
.RAyV4b {
|
.RAyV4b {
|
||||||
line-height: 140px;
|
height: 220px;
|
||||||
overflow: "hidden";
|
line-height: 220px;
|
||||||
|
overflow: hidden;
|
||||||
text-align: center;
|
text-align: center;
|
||||||
}
|
}
|
||||||
.t0fcAb {
|
.t0fcAb {
|
||||||
text-align: center;
|
text-align: center;
|
||||||
margin: auto;
|
margin: auto;
|
||||||
vertical-align: middle;
|
vertical-align: middle;
|
||||||
object-fit: contain;
|
object-fit: cover;
|
||||||
|
max-width: 100%;
|
||||||
|
height: auto;
|
||||||
|
max-height: 220px;
|
||||||
|
display: block;
|
||||||
}
|
}
|
||||||
.Tor4Ec {
|
.Tor4Ec {
|
||||||
padding-top: 2px;
|
padding-top: 2px;
|
||||||
|
|
@ -313,6 +319,24 @@
|
||||||
a .CVA68e:hover {
|
a .CVA68e:hover {
|
||||||
text-decoration: underline;
|
text-decoration: underline;
|
||||||
}
|
}
|
||||||
|
.e3goi {
|
||||||
|
width: 25%;
|
||||||
|
padding: 10px;
|
||||||
|
box-sizing: border-box;
|
||||||
|
}
|
||||||
|
.svla5d {
|
||||||
|
max-width: 100%;
|
||||||
|
}
|
||||||
|
@media (max-width: 900px) {
|
||||||
|
.e3goi {
|
||||||
|
width: 50%;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
@media (max-width: 600px) {
|
||||||
|
.e3goi {
|
||||||
|
width: 100%;
|
||||||
|
}
|
||||||
|
}
|
||||||
</style>
|
</style>
|
||||||
<div>
|
<div>
|
||||||
<div>
|
<div>
|
||||||
|
|
|
||||||
|
|
@ -257,6 +257,30 @@
|
||||||
<input type="checkbox" name="show_user_agent"
|
<input type="checkbox" name="show_user_agent"
|
||||||
id="config-show-user-agent" {{ 'checked' if config.show_user_agent else '' }}>
|
id="config-show-user-agent" {{ 'checked' if config.show_user_agent else '' }}>
|
||||||
</div>
|
</div>
|
||||||
|
<!-- Google Custom Search Engine (BYOK) Settings -->
|
||||||
|
<div class="config-div config-div-cse-header" style="margin-top: 20px; border-top: 1px solid var(--result-bg); padding-top: 15px;">
|
||||||
|
<strong>Google Custom Search (BYOK)</strong>
|
||||||
|
<div><span class="info-text"> — <a href="https://github.com/benbusby/whoogle-search#google-custom-search-byok">Setup Guide</a></span></div>
|
||||||
|
</div>
|
||||||
|
<div class="config-div config-div-use-cse">
|
||||||
|
<label for="config-use-cse">Use Custom Search API: </label>
|
||||||
|
<input type="checkbox" name="use_cse" id="config-use-cse" {{ 'checked' if config.use_cse else '' }}>
|
||||||
|
<div><span class="info-text"> — Enable to use your own Google API key (100 free queries/day)</span></div>
|
||||||
|
</div>
|
||||||
|
<div class="config-div config-div-cse-api-key">
|
||||||
|
<label for="config-cse-api-key">CSE API Key: </label>
|
||||||
|
<input type="password" name="cse_api_key" id="config-cse-api-key"
|
||||||
|
value="{{ config.cse_api_key }}"
|
||||||
|
placeholder="AIza..."
|
||||||
|
autocomplete="off">
|
||||||
|
</div>
|
||||||
|
<div class="config-div config-div-cse-id">
|
||||||
|
<label for="config-cse-id">CSE ID: </label>
|
||||||
|
<input type="text" name="cse_id" id="config-cse-id"
|
||||||
|
value="{{ config.cse_id }}"
|
||||||
|
placeholder="abc123..."
|
||||||
|
autocomplete="off">
|
||||||
|
</div>
|
||||||
<div class="config-div config-div-root-url">
|
<div class="config-div config-div-root-url">
|
||||||
<label for="config-url">{{ translation['config-url'] }}: </label>
|
<label for="config-url">{{ translation['config-url'] }}: </label>
|
||||||
<input type="text" name="url" id="config-url" value="{{ config.url }}">
|
<input type="text" name="url" id="config-url" value="{{ config.url }}">
|
||||||
|
|
|
||||||
|
|
@ -5,6 +5,7 @@ from app.filter import Filter
|
||||||
from app.request import gen_query
|
from app.request import gen_query
|
||||||
from app.utils.misc import get_proxy_host_url
|
from app.utils.misc import get_proxy_host_url
|
||||||
from app.utils.results import get_first_link
|
from app.utils.results import get_first_link
|
||||||
|
from app.services.cse_client import CSEClient, cse_results_to_html
|
||||||
from bs4 import BeautifulSoup as bsoup
|
from bs4 import BeautifulSoup as bsoup
|
||||||
from cryptography.fernet import Fernet, InvalidToken
|
from cryptography.fernet import Fernet, InvalidToken
|
||||||
from flask import g
|
from flask import g
|
||||||
|
|
@ -140,7 +141,91 @@ class Search:
|
||||||
root_url=root_url,
|
root_url=root_url,
|
||||||
mobile=mobile,
|
mobile=mobile,
|
||||||
config=self.config,
|
config=self.config,
|
||||||
query=self.query)
|
query=self.query,
|
||||||
|
page_url=self.request.url)
|
||||||
|
|
||||||
|
# Check if CSE (Custom Search Engine) should be used
|
||||||
|
use_cse = (
|
||||||
|
self.config.use_cse and
|
||||||
|
self.config.cse_api_key and
|
||||||
|
self.config.cse_id
|
||||||
|
)
|
||||||
|
|
||||||
|
if use_cse:
|
||||||
|
# Use Google Custom Search API
|
||||||
|
return self._generate_cse_response(content_filter, root_url, mobile)
|
||||||
|
|
||||||
|
# Default: Use traditional scraping method
|
||||||
|
return self._generate_scrape_response(content_filter, root_url, mobile)
|
||||||
|
|
||||||
|
def _generate_cse_response(self, content_filter: Filter, root_url: str, mobile: bool) -> str:
|
||||||
|
"""Generate response using Google Custom Search API
|
||||||
|
|
||||||
|
Args:
|
||||||
|
content_filter: Filter instance for processing results
|
||||||
|
root_url: Root URL of the instance
|
||||||
|
mobile: Whether this is a mobile request
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
str: HTML response string
|
||||||
|
"""
|
||||||
|
# Get pagination start index from request params
|
||||||
|
start = int(self.request_params.get('start', 1))
|
||||||
|
|
||||||
|
# Determine safe search setting
|
||||||
|
safe = 'high' if self.config.safe else 'off'
|
||||||
|
|
||||||
|
# Determine search type (web or image)
|
||||||
|
# tbm=isch or udm=2 indicates image search
|
||||||
|
search_type = ''
|
||||||
|
if self.search_type == 'isch' or self.request_params.get('udm') == '2':
|
||||||
|
search_type = 'image'
|
||||||
|
|
||||||
|
# Create CSE client and perform search
|
||||||
|
with CSEClient(
|
||||||
|
api_key=self.config.cse_api_key,
|
||||||
|
cse_id=self.config.cse_id
|
||||||
|
) as client:
|
||||||
|
response = client.search(
|
||||||
|
query=self.query,
|
||||||
|
start=start,
|
||||||
|
safe=safe,
|
||||||
|
language=self.config.lang_search,
|
||||||
|
country=self.config.country,
|
||||||
|
search_type=search_type
|
||||||
|
)
|
||||||
|
|
||||||
|
# Convert CSE response to HTML
|
||||||
|
html_content = cse_results_to_html(response, self.query)
|
||||||
|
|
||||||
|
# Store full query for tabs
|
||||||
|
self.full_query = self.query
|
||||||
|
|
||||||
|
# Parse and filter the HTML
|
||||||
|
html_soup = bsoup(html_content, 'html.parser')
|
||||||
|
|
||||||
|
# Handle feeling lucky
|
||||||
|
if self.feeling_lucky:
|
||||||
|
if response.has_results and response.results:
|
||||||
|
return response.results[0].link
|
||||||
|
self.feeling_lucky = False
|
||||||
|
|
||||||
|
# Apply content filter (encrypts links, applies CSS, etc.)
|
||||||
|
formatted_results = content_filter.clean(html_soup)
|
||||||
|
|
||||||
|
return str(formatted_results)
|
||||||
|
|
||||||
|
def _generate_scrape_response(self, content_filter: Filter, root_url: str, mobile: bool) -> str:
|
||||||
|
"""Generate response using traditional HTML scraping
|
||||||
|
|
||||||
|
Args:
|
||||||
|
content_filter: Filter instance for processing results
|
||||||
|
root_url: Root URL of the instance
|
||||||
|
mobile: Whether this is a mobile request
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
str: HTML response string
|
||||||
|
"""
|
||||||
full_query = gen_query(self.query,
|
full_query = gen_query(self.query,
|
||||||
self.request_params,
|
self.request_params,
|
||||||
self.config)
|
self.config)
|
||||||
|
|
@ -148,8 +233,10 @@ class Search:
|
||||||
|
|
||||||
# force mobile search when view image is true and
|
# force mobile search when view image is true and
|
||||||
# the request is not already made by a mobile
|
# the request is not already made by a mobile
|
||||||
view_image = ('tbm=isch' in full_query
|
is_image_query = ('tbm=isch' in full_query) or ('udm=2' in full_query)
|
||||||
and self.config.view_image)
|
# Always parse image results when hitting the images endpoint (udm=2)
|
||||||
|
# to avoid Google returning only text/AI blocks.
|
||||||
|
view_image = is_image_query
|
||||||
|
|
||||||
client = self.user_request or g.user_request
|
client = self.user_request or g.user_request
|
||||||
get_body = client.send(query=full_query,
|
get_body = client.send(query=full_query,
|
||||||
|
|
@ -194,4 +281,3 @@ class Search:
|
||||||
link['href'] += param_str
|
link['href'] += param_str
|
||||||
|
|
||||||
return str(formatted_results)
|
return str(formatted_results)
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -1,8 +1,8 @@
|
||||||
import os
|
import os
|
||||||
|
|
||||||
optional_dev_tag = '-update-testing'
|
optional_dev_tag = ''
|
||||||
if os.getenv('DEV_BUILD'):
|
if os.getenv('DEV_BUILD'):
|
||||||
optional_dev_tag = '.dev' + os.getenv('DEV_BUILD')
|
optional_dev_tag = '.dev' + os.getenv('DEV_BUILD')
|
||||||
|
|
||||||
__version__ = '1.2.0' + optional_dev_tag
|
__version__ = '1.2.2' + optional_dev_tag
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -1,6 +1,5 @@
|
||||||
# can't use mem_limit in a 3.x docker-compose file in non swarm mode
|
# Modern docker-compose format (v2+) does not require version specification
|
||||||
# see https://github.com/docker/compose/issues/4513
|
# Memory limits are supported in Compose v2+ without version field
|
||||||
version: "2.4"
|
|
||||||
|
|
||||||
services:
|
services:
|
||||||
whoogle-search:
|
whoogle-search:
|
||||||
|
|
|
||||||
|
|
@ -30,5 +30,5 @@ h11>=0.16.0
|
||||||
validators==0.35.0
|
validators==0.35.0
|
||||||
waitress==3.0.2
|
waitress==3.0.2
|
||||||
wcwidth==0.2.14
|
wcwidth==0.2.14
|
||||||
Werkzeug==3.1.3
|
Werkzeug==3.1.4
|
||||||
python-dotenv==1.1.1
|
python-dotenv==1.1.1
|
||||||
|
|
|
||||||
|
|
@ -72,9 +72,6 @@
|
||||||
# Remove everything except basic result cards from all search queries
|
# Remove everything except basic result cards from all search queries
|
||||||
#WHOOGLE_MINIMAL=0
|
#WHOOGLE_MINIMAL=0
|
||||||
|
|
||||||
# Set the number of results per page
|
|
||||||
#WHOOGLE_RESULTS_PER_PAGE=10
|
|
||||||
|
|
||||||
# Controls visibility of autocomplete/search suggestions
|
# Controls visibility of autocomplete/search suggestions
|
||||||
#WHOOGLE_AUTOCOMPLETE=1
|
#WHOOGLE_AUTOCOMPLETE=1
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue