mirror of
https://github.com/benbusby/whoogle-search.git
synced 2026-03-11 08:54:34 +00:00
commit
3924410503
18 changed files with 1337 additions and 60 deletions
12
.github/workflows/.pre-commit-config.yaml
vendored
Normal file
12
.github/workflows/.pre-commit-config.yaml
vendored
Normal file
|
|
@ -0,0 +1,12 @@
|
|||
repos:
|
||||
- repo: https://github.com/astral-sh/ruff-pre-commit
|
||||
rev: v0.6.9
|
||||
hooks:
|
||||
- id: ruff
|
||||
args: [--fix]
|
||||
- id: ruff-format
|
||||
- repo: https://github.com/psf/black
|
||||
rev: 24.8.0
|
||||
hooks:
|
||||
- id: black
|
||||
args: [--quiet]
|
||||
2
.github/workflows/buildx.yml
vendored
2
.github/workflows/buildx.yml
vendored
|
|
@ -88,4 +88,4 @@ jobs:
|
|||
--platform linux/amd64,linux/arm/v7,linux/arm64 .
|
||||
docker buildx build --push \
|
||||
--tag ghcr.io/benbusby/whoogle-search:${GITHUB_REF#refs/*/v}\
|
||||
--platform linux/amd64,linux/arm/v7,linux/arm64 .
|
||||
--platform linux/amd64,linux/arm/v7,linux/arm64 .
|
||||
2
.github/workflows/pypi.yml
vendored
2
.github/workflows/pypi.yml
vendored
|
|
@ -80,4 +80,4 @@ jobs:
|
|||
if: steps.check_tag.outputs.is_stable == 'true'
|
||||
uses: pypa/gh-action-pypi-publish@master
|
||||
with:
|
||||
password: ${{ secrets.PYPI_API_TOKEN }}
|
||||
password: ${{ secrets.PYPI_API_TOKEN }}
|
||||
163
README.md
163
README.md
|
|
@ -1,12 +1,10 @@
|
|||
>[!WARNING]
|
||||
>
|
||||
>**Mullvad Leta Backend Now Available!**
|
||||
>Since 16 January, 2025, Google has been attacking the ability to perform search queries without JavaScript enabled. This is a fundamental part of how Whoogle
|
||||
>works -- Whoogle requests the JavaScript-free search results, then filters out garbage from the results page and proxies all external content for the user.
|
||||
>
|
||||
>As of 16 January, 2025, Google seemingly no longer supports performing search queries without JavaScript enabled. We have made multiple workarounds, but as of 2 October 2025, Google has killed off all remaining methods we had to retrieve results from them originally. While we work to rebuild and hopefully find new ways to continue on, we have released a stopgap which uses [Mullvad Leta](https://leta.mullvad.net) (an alternative privacy-focused search backend) as the default (but disable-able) backend leveraging their Google results.
|
||||
>
|
||||
>**Leta is now enabled by default**. It provides anonymous search results through Mullvad's infrastructure without requiring JavaScript. While Leta doesn't support image, video, news, or map searches, it provides privacy-focused web search results.
|
||||
>
|
||||
>To switch back to Google (if it becomes available again), you can disable Leta in the config settings or set `WHOOGLE_CONFIG_USE_LETA=0` in your environment variables. See [LETA_INTEGRATION.md](LETA_INTEGRATION.md) for more details.
|
||||
>This is possibly a breaking change that may mean the end for Whoogle. We'll continue fighting back and releasing workarounds until all workarounds are
|
||||
>exhausted or a better method is found.
|
||||
|
||||
___
|
||||
|
||||
|
|
@ -71,7 +69,12 @@ Contents
|
|||
- POST request search and suggestion queries (when possible)
|
||||
- View images at full res without site redirect (currently mobile only)
|
||||
- Light/Dark/System theme modes (with support for [custom CSS theming](https://github.com/benbusby/whoogle-search/wiki/User-Contributed-CSS-Themes))
|
||||
- Randomly generated User Agent
|
||||
- Auto-generated Opera User Agents with random rotation
|
||||
- 10 unique Opera-based UAs generated on startup from 115 language variants
|
||||
- Randomly rotated for each search request to avoid detection patterns
|
||||
- Cached across restarts with configurable refresh options
|
||||
- Fallback to safe default UA if generation fails
|
||||
- Optional display of current UA in search results footer
|
||||
- Easy to install/deploy
|
||||
- DDG-style bang (i.e. `!<tag> <query>`) searches
|
||||
- User-defined [custom bangs](#custom-bangs)
|
||||
|
|
@ -440,9 +443,12 @@ There are a few optional environment variables available for customizing a Whoog
|
|||
| WHOOGLE_PROXY_PASS | The password of the proxy server. |
|
||||
| WHOOGLE_PROXY_TYPE | The type of the proxy server. Can be "socks5", "socks4", or "http". |
|
||||
| WHOOGLE_PROXY_LOC | The location of the proxy server (host or ip). |
|
||||
| WHOOGLE_USER_AGENT | The desktop user agent to use. Defaults to a randomly generated one. |
|
||||
| WHOOGLE_USER_AGENT_MOBILE | The mobile user agent to use. Defaults to a randomly generated one. |
|
||||
| WHOOGLE_USER_AGENT | The desktop user agent to use when using 'env_conf' option. Leave empty to use auto-generated Opera UAs. |
|
||||
| WHOOGLE_USER_AGENT_MOBILE | The mobile user agent to use when using 'env_conf' option. Leave empty to use auto-generated Opera UAs. |
|
||||
| WHOOGLE_USE_CLIENT_USER_AGENT | Enable to use your own user agent for all requests. Defaults to false. |
|
||||
| WHOOGLE_UA_CACHE_PERSISTENT | Whether to persist auto-generated UAs across restarts. Set to '0' to regenerate on each startup. Default '1'. |
|
||||
| WHOOGLE_UA_CACHE_REFRESH_DAYS | Auto-refresh UA cache after N days. Set to '0' to never refresh (cache persists indefinitely). Default '0'. |
|
||||
| WHOOGLE_UA_LIST_FILE | Path to text file containing custom UA strings (one per line). When set, uses these instead of auto-generated UAs. |
|
||||
| WHOOGLE_REDIRECTS | Specify sites that should be redirected elsewhere. See [custom redirecting](#custom-redirecting). |
|
||||
| EXPOSE_PORT | The port where Whoogle will be exposed. |
|
||||
| HTTPS_ONLY | Enforce HTTPS. (See [here](https://github.com/benbusby/whoogle-search#https-enforcement)) |
|
||||
|
|
@ -494,7 +500,7 @@ These environment variables allow setting default config values, but can be over
|
|||
| WHOOGLE_CONFIG_PREFERENCES_ENCRYPTED | Encrypt preferences token, requires preferences key |
|
||||
| WHOOGLE_CONFIG_PREFERENCES_KEY | Key to encrypt preferences in URL (REQUIRED to show url) |
|
||||
| WHOOGLE_CONFIG_ANON_VIEW | Include the "anonymous view" option for each search result |
|
||||
| WHOOGLE_CONFIG_USE_LETA | Use Mullvad Leta as search backend (default: enabled). Set to 0 to use Google instead |
|
||||
| WHOOGLE_CONFIG_SHOW_USER_AGENT | Display the User Agent string used for search in results footer |
|
||||
|
||||
## Usage
|
||||
Same as most search engines, with the exception of filtering by time range.
|
||||
|
|
@ -666,6 +672,141 @@ Whoogle can optionally serve a single bundled CSS and JS to reduce the number of
|
|||
- When disabled (default), templates load individual CSS/JS files for easier development.
|
||||
- Note: Theme CSS (`*-theme.css`) are still loaded separately to honor user theme selection.
|
||||
|
||||
## User Agent Generator Tool
|
||||
|
||||
A standalone command-line tool is available for generating Opera User Agent strings on demand:
|
||||
|
||||
```bash
|
||||
# Generate 10 User Agent strings (default)
|
||||
python misc/generate_uas.py
|
||||
|
||||
# Generate custom number of UAs
|
||||
python misc/generate_uas.py 20
|
||||
```
|
||||
|
||||
This tool is useful for:
|
||||
- Testing different UA strings
|
||||
- Generating UAs for other projects
|
||||
- Verifying UA generation patterns
|
||||
- Debugging UA-related issues
|
||||
|
||||
## Using Custom User Agent Lists
|
||||
|
||||
Instead of using auto-generated Opera UA strings, you can provide your own list of User Agent strings for Whoogle to use.
|
||||
|
||||
### Setup
|
||||
|
||||
1. Create a text file with your preferred UA strings (one per line):
|
||||
|
||||
```
|
||||
Opera/9.80 (J2ME/MIDP; Opera Mini/4.2.13337/22.478; U; en) Presto/2.4.15 Version/10.00
|
||||
Opera/9.80 (Android; Linux; Opera Mobi/498; U; en) Presto/2.12.423 Version/10.1
|
||||
Opera/9.30 (Nintendo Wii; U; ; 3642; en)
|
||||
```
|
||||
|
||||
2. Set the `WHOOGLE_UA_LIST_FILE` environment variable to point to your file:
|
||||
|
||||
```bash
|
||||
# Docker
|
||||
docker run -e WHOOGLE_UA_LIST_FILE=/config/my_user_agents.txt ...
|
||||
|
||||
# Docker Compose
|
||||
environment:
|
||||
- WHOOGLE_UA_LIST_FILE=/config/my_user_agents.txt
|
||||
|
||||
# Manual/systemd
|
||||
export WHOOGLE_UA_LIST_FILE=/path/to/my_user_agents.txt
|
||||
```
|
||||
|
||||
### Priority Order
|
||||
|
||||
Whoogle uses the following priority when loading User Agent strings:
|
||||
|
||||
1. **Custom UA list file** (if `WHOOGLE_UA_LIST_FILE` is set and valid)
|
||||
2. **Cached auto-generated UAs** (if cache exists and is valid)
|
||||
3. **Newly generated UAs** (if no cache or cache expired)
|
||||
|
||||
### Tips
|
||||
|
||||
- You can use the output from `misc/check_google_user_agents.py` as your custom UA list
|
||||
- Generate a list with `python misc/generate_uas.py 50 2>/dev/null > my_uas.txt`
|
||||
- Mix different UA types (Opera, Firefox, Chrome) for more variety
|
||||
- Keep the file readable by Whoogle (proper permissions)
|
||||
- One UA string per line, blank lines are ignored
|
||||
|
||||
### Example Workflow
|
||||
|
||||
```bash
|
||||
# Generate and test UAs, save working ones
|
||||
python misc/generate_uas.py 100 2>/dev/null > candidate_uas.txt
|
||||
python misc/check_google_user_agents.py candidate_uas.txt --output working_uas.txt
|
||||
|
||||
# Use the working UAs with Whoogle
|
||||
export WHOOGLE_UA_LIST_FILE=./working_uas.txt
|
||||
./run
|
||||
```
|
||||
|
||||
## User Agent Testing Tool
|
||||
|
||||
Whoogle now includes a comprehensive testing tool (`misc/check_google_user_agents.py`) to verify which User Agent strings successfully return Google search results without triggering blocks, JavaScript-only pages, or browser upgrade prompts.
|
||||
|
||||
### Usage
|
||||
|
||||
```bash
|
||||
# Test all UAs from a file
|
||||
python misc/check_google_user_agents.py UAs.txt
|
||||
|
||||
# Save working UAs to a file (appends incrementally)
|
||||
python misc/check_google_user_agents.py UAs.txt --output working_uas.txt
|
||||
|
||||
# Use a specific search query
|
||||
python misc/check_google_user_agents.py UAs.txt --query "python programming"
|
||||
|
||||
# Verbose mode to see detailed results
|
||||
python misc/check_google_user_agents.py UAs.txt --output working.txt --verbose
|
||||
|
||||
# Adjust delay between requests (default: 0.5 seconds)
|
||||
python misc/check_google_user_agents.py UAs.txt --delay 1.0
|
||||
|
||||
# Set request timeout (default: 10 seconds)
|
||||
python misc/check_google_user_agents.py UAs.txt --timeout 15.0
|
||||
```
|
||||
|
||||
### Features
|
||||
|
||||
- **Incremental Results**: Working UAs are saved immediately to the output file (append mode), so progress is preserved even if interrupted
|
||||
- **Duplicate Detection**: Automatically skips UAs already in the output file when resuming
|
||||
- **Random Query Cycling**: By default, cycles through diverse search queries to simulate realistic usage patterns
|
||||
- **Rate Limit Detection**: Detects and reports Google rate limiting with recovery instructions
|
||||
- **Comprehensive Validation**: Checks for:
|
||||
- HTTP status codes (blocks, server errors, rate limits)
|
||||
- Block markers (unusual traffic, upgrade browser messages)
|
||||
- Success markers (actual search result HTML elements)
|
||||
- JavaScript-only pages and redirects
|
||||
- Response size validation
|
||||
|
||||
### Testing Methodology
|
||||
|
||||
The tool evaluates UAs against multiple criteria:
|
||||
|
||||
1. **HTTP Status**: Rejects 4xx/5xx errors, detects 429 rate limits
|
||||
2. **Block Detection**: Searches for Google's block messages (CAPTCHA, unusual traffic, etc.)
|
||||
3. **JavaScript Detection**: Identifies JS-only pages and noscript redirects
|
||||
4. **Result Validation**: Confirms presence of actual search result HTML elements
|
||||
5. **Content Analysis**: Validates response size and structure
|
||||
|
||||
This tool was used to discover and validate the working Opera UA patterns that power Whoogle's auto-generation feature.
|
||||
|
||||
## Known Issues
|
||||
|
||||
### User Agent Strings and Image Search
|
||||
|
||||
**Issue**: Most, if not all, of the auto-generated Opera User Agent strings may fail when performing **image searches** on Google. This appears to be a limitation with how Google's image search validates User Agent strings.
|
||||
|
||||
**Impact**:
|
||||
- Regular web searches work correctly with generated UAs
|
||||
- Image search may return errors or no results
|
||||
|
||||
## Contributing
|
||||
|
||||
Under the hood, Whoogle is a basic Flask app with the following structure:
|
||||
|
|
@ -679,6 +820,7 @@ Under the hood, Whoogle is a basic Flask app with the following structure:
|
|||
- `results.py`: Utility functions for interpreting/modifying individual search results
|
||||
- `search.py`: Creates and handles new search queries
|
||||
- `session.py`: Miscellaneous methods related to user sessions
|
||||
- `ua_generator.py`: Auto-generates Opera User Agent strings with pattern-based randomization
|
||||
- `templates/`
|
||||
- `index.html`: The home page template
|
||||
- `display.html`: The search results template
|
||||
|
|
@ -753,6 +895,7 @@ A lot of the app currently piggybacks on Google's existing support for fetching
|
|||
| [https://whoogle.privacydev.net](https://whoogle.privacydev.net) | 🇫🇷 FR | English | |
|
||||
| [https://whoogle.lunar.icu](https://whoogle.lunar.icu) | 🇩🇪 DE | Multi-choice | ✅ |
|
||||
|
||||
|
||||
* A checkmark in the "Cloudflare" category here refers to the use of the reverse proxy, [Cloudflare](https://cloudflare.com). The checkmark will not be listed for a site which uses Cloudflare DNS but rather the proxying service which grants Cloudflare the ability to monitor traffic to the website.
|
||||
|
||||
#### Onion Instances
|
||||
|
|
|
|||
|
|
@ -3,6 +3,7 @@ from app.request import send_tor_signal
|
|||
from app.utils.session import generate_key
|
||||
from app.utils.bangs import gen_bangs_json, load_all_bangs
|
||||
from app.utils.misc import gen_file_hash, read_config_bool
|
||||
from app.utils.ua_generator import load_ua_pool
|
||||
from base64 import b64encode
|
||||
from bs4 import MarkupResemblesLocatorWarning
|
||||
from datetime import datetime, timedelta
|
||||
|
|
@ -107,6 +108,16 @@ if not os.path.exists(app.config['BANG_PATH']):
|
|||
if not os.path.exists(app.config['BUILD_FOLDER']):
|
||||
os.makedirs(app.config['BUILD_FOLDER'])
|
||||
|
||||
# Initialize User Agent pool
|
||||
app.config['UA_CACHE_PATH'] = os.path.join(app.config['CONFIG_PATH'], 'ua_cache.json')
|
||||
try:
|
||||
app.config['UA_POOL'] = load_ua_pool(app.config['UA_CACHE_PATH'], count=10)
|
||||
except Exception as e:
|
||||
# If UA pool loading fails, log warning and set empty pool
|
||||
# The gen_user_agent function will handle the fallback
|
||||
print(f"Warning: Could not initialize UA pool: {e}")
|
||||
app.config['UA_POOL'] = []
|
||||
|
||||
# Session values
|
||||
app_key_path = os.path.join(app.config['CONFIG_PATH'], 'whoogle.key')
|
||||
if os.path.exists(app_key_path):
|
||||
|
|
|
|||
|
|
@ -45,6 +45,7 @@ class Config:
|
|||
self.user_agent = kwargs.get('user_agent', default_ua_option)
|
||||
self.custom_user_agent = kwargs.get('custom_user_agent', '')
|
||||
self.use_custom_user_agent = kwargs.get('use_custom_user_agent', False)
|
||||
self.show_user_agent = read_config_bool('WHOOGLE_CONFIG_SHOW_USER_AGENT')
|
||||
|
||||
# Add user agent related keys to safe_keys
|
||||
self.safe_keys = [
|
||||
|
|
@ -64,7 +65,7 @@ class Config:
|
|||
'user_agent',
|
||||
'custom_user_agent',
|
||||
'use_custom_user_agent',
|
||||
'use_leta'
|
||||
'show_user_agent'
|
||||
]
|
||||
|
||||
app_config = current_app.config
|
||||
|
|
@ -99,7 +100,10 @@ class Config:
|
|||
if kwargs:
|
||||
mutable_attrs = self.get_mutable_attrs()
|
||||
for attr in mutable_attrs:
|
||||
if attr in kwargs.keys():
|
||||
if attr == 'show_user_agent':
|
||||
# Handle show_user_agent as boolean
|
||||
self.show_user_agent = bool(kwargs.get(attr))
|
||||
elif attr in kwargs.keys():
|
||||
setattr(self, attr, kwargs[attr])
|
||||
elif attr not in kwargs.keys() and mutable_attrs[attr] == bool:
|
||||
# Only set to False if the attribute wasn't already set to True
|
||||
|
|
|
|||
|
|
@ -1,9 +1,8 @@
|
|||
from app.models.config import Config
|
||||
from app.utils.misc import read_config_bool
|
||||
from app.services.provider import get_http_client
|
||||
from datetime import datetime
|
||||
from app.utils.ua_generator import load_ua_pool, get_random_ua, DEFAULT_FALLBACK_UA
|
||||
from defusedxml import ElementTree as ET
|
||||
import random
|
||||
import httpx
|
||||
import urllib.parse as urlparse
|
||||
import os
|
||||
|
|
@ -16,9 +15,6 @@ MAPS_URL = 'https://maps.google.com/maps'
|
|||
AUTOCOMPLETE_URL = ('https://suggestqueries.google.com/'
|
||||
'complete/search?client=toolbar&')
|
||||
|
||||
MOBILE_UA = '{}/5.0 (Android 0; Mobile; rv:54.0) Gecko/54.0 {}/59.0'
|
||||
DESKTOP_UA = '{}/5.0 (X11; {} x86_64; rv:75.0) Gecko/20100101 {}/75.0'
|
||||
|
||||
# Valid query params
|
||||
VALID_PARAMS = ['tbs', 'tbm', 'start', 'near', 'source', 'nfpr']
|
||||
|
||||
|
|
@ -73,9 +69,6 @@ def send_tor_signal(signal: Signal) -> bool:
|
|||
|
||||
|
||||
def gen_user_agent(config, is_mobile) -> str:
|
||||
# Define the default PlayStation Portable user agent (replaces Lynx)
|
||||
DEFAULT_UA = 'Mozilla/4.0 (PSP (PlayStation Portable); 2.00)'
|
||||
|
||||
# If using custom user agent, return the custom string
|
||||
if config.user_agent == 'custom' and config.custom_user_agent:
|
||||
return config.custom_user_agent
|
||||
|
|
@ -90,21 +83,37 @@ def gen_user_agent(config, is_mobile) -> str:
|
|||
env_ua = os.getenv('WHOOGLE_USER_AGENT', '')
|
||||
if env_ua:
|
||||
return env_ua
|
||||
# If env vars are not set, fall back to default
|
||||
return DEFAULT_UA
|
||||
# If env vars are not set, fall back to Opera UA
|
||||
return DEFAULT_FALLBACK_UA
|
||||
|
||||
# If using default user agent
|
||||
# If using default user agent - use auto-generated Opera UA pool
|
||||
if config.user_agent == 'default':
|
||||
return DEFAULT_UA
|
||||
try:
|
||||
# Try to load UA pool from cache (lazy loading if not in app.config)
|
||||
# First check if we have access to Flask app context
|
||||
try:
|
||||
from flask import current_app
|
||||
if hasattr(current_app, 'config') and 'UA_POOL' in current_app.config:
|
||||
ua_pool = current_app.config['UA_POOL']
|
||||
else:
|
||||
# Fall back to loading from disk
|
||||
raise ImportError("UA_POOL not in app config")
|
||||
except (ImportError, RuntimeError):
|
||||
# No Flask context available or UA_POOL not in config, load from disk
|
||||
config_path = os.environ.get('CONFIG_VOLUME',
|
||||
os.path.join(os.path.dirname(os.path.abspath(__file__)),
|
||||
'static', 'config'))
|
||||
cache_path = os.path.join(config_path, 'ua_cache.json')
|
||||
ua_pool = load_ua_pool(cache_path, count=10)
|
||||
|
||||
return get_random_ua(ua_pool)
|
||||
except Exception as e:
|
||||
# If anything goes wrong, fall back to default Opera UA
|
||||
print(f"Warning: Could not load UA pool, using fallback Opera UA: {e}")
|
||||
return DEFAULT_FALLBACK_UA
|
||||
|
||||
# If no custom user agent is set, generate a random one (for backwards compatibility)
|
||||
firefox = random.choice(['Choir', 'Squier', 'Higher', 'Wire']) + 'fox'
|
||||
linux = random.choice(['Win', 'Sin', 'Gin', 'Fin', 'Kin']) + 'ux'
|
||||
|
||||
if is_mobile:
|
||||
return MOBILE_UA.format("Mozilla", firefox)
|
||||
|
||||
return DESKTOP_UA.format("Mozilla", linux, firefox)
|
||||
# Fallback for backwards compatibility (old configs or invalid user_agent values)
|
||||
return DEFAULT_FALLBACK_UA
|
||||
|
||||
|
||||
def gen_query_leta(query, args, config) -> str:
|
||||
|
|
@ -399,23 +408,39 @@ class Request:
|
|||
modified_user_agent = self.modified_user_agent
|
||||
|
||||
headers = {
|
||||
'User-Agent': modified_user_agent
|
||||
'User-Agent': modified_user_agent,
|
||||
'Accept': ('text/html,application/xhtml+xml,application/xml;'
|
||||
'q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8'),
|
||||
'Accept-Language': 'en-US,en;q=0.9',
|
||||
'Accept-Encoding': 'gzip, deflate, br',
|
||||
'Connection': 'keep-alive',
|
||||
'Cache-Control': 'max-age=0',
|
||||
'Pragma': 'no-cache',
|
||||
'Upgrade-Insecure-Requests': '1',
|
||||
'Sec-Fetch-Site': 'none',
|
||||
'Sec-Fetch-Mode': 'navigate',
|
||||
'Sec-Fetch-User': '?1',
|
||||
'Sec-Fetch-Dest': 'document',
|
||||
'Sec-CH-UA': (
|
||||
'"Not/A)Brand";v="8", '
|
||||
'"Chromium";v="127", '
|
||||
'"Google Chrome";v="127"'
|
||||
),
|
||||
'Sec-CH-UA-Mobile': '?0',
|
||||
'Sec-CH-UA-Platform': '"macOS"'
|
||||
}
|
||||
|
||||
# Adding the Accept-Language to the Header if possible
|
||||
# Add Accept-Language header tied to the current config if requested
|
||||
if self.lang_interface:
|
||||
headers.update({'Accept-Language':
|
||||
self.lang_interface.replace('lang_', '')
|
||||
+ ';q=1.0'})
|
||||
headers['Accept-Language'] = (
|
||||
self.lang_interface.replace('lang_', '') + ';q=1.0'
|
||||
)
|
||||
|
||||
# view is suppressed correctly
|
||||
now = datetime.now()
|
||||
consent_cookie = 'CONSENT=PENDING+987; SOCS=CAESHAgBEhIaAB'
|
||||
# Prefer header-based cookies to avoid httpx per-request cookies deprecation
|
||||
if 'Cookie' in headers:
|
||||
headers['Cookie'] += '; ' + consent_cookie
|
||||
else:
|
||||
headers['Cookie'] = consent_cookie
|
||||
# Consent cookies keep Google from showing the interstitial consent wall
|
||||
consent_cookies = {
|
||||
'CONSENT': 'PENDING+987',
|
||||
'SOCS': 'CAESHAgBEhIaAB'
|
||||
}
|
||||
|
||||
# Validate Tor conn and request new identity if the last one failed
|
||||
if self.tor and not send_tor_signal(
|
||||
|
|
@ -446,7 +471,8 @@ class Request:
|
|||
try:
|
||||
response = self.http_client.get(
|
||||
(base_url or self.search_url) + query,
|
||||
headers=headers)
|
||||
headers=headers,
|
||||
cookies=consent_cookies)
|
||||
except httpx.HTTPError as e:
|
||||
raise
|
||||
|
||||
|
|
|
|||
|
|
@ -555,6 +555,13 @@ def search():
|
|||
'results': results
|
||||
})
|
||||
|
||||
# Get the user agent that was used for the search
|
||||
used_user_agent = ''
|
||||
if search_util.user_request:
|
||||
used_user_agent = search_util.user_request.modified_user_agent
|
||||
elif hasattr(g, 'user_request') and g.user_request:
|
||||
used_user_agent = g.user_request.modified_user_agent
|
||||
|
||||
return render_template(
|
||||
'display.html',
|
||||
has_update=app.config['HAS_UPDATE'],
|
||||
|
|
@ -576,6 +583,7 @@ def search():
|
|||
) and not search_util.search_type, # Standard search queries only
|
||||
response=cleanresponse,
|
||||
version_number=app.config['VERSION_NUMBER'],
|
||||
used_user_agent=used_user_agent,
|
||||
search_header=render_template(
|
||||
'header.html',
|
||||
home_url=home_url,
|
||||
|
|
|
|||
|
|
@ -5,5 +5,8 @@
|
|||
{% if has_update %}
|
||||
|| <span class="update_available">Update Available 🟢</span>
|
||||
{% endif %}
|
||||
{% if config.show_user_agent and used_user_agent %}
|
||||
<br><span class="user-agent-display" style="font-size: 0.85em; color: #666;">User Agent: {{ used_user_agent }}</span>
|
||||
{% endif %}
|
||||
</p>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -264,6 +264,11 @@
|
|||
<input type="checkbox" name="accept_language"
|
||||
id="config-accept-language" {{ 'checked' if config.accept_language else '' }}>
|
||||
</div>
|
||||
<div class="config-div config-div-show-user-agent">
|
||||
<label for="config-show-user-agent">Show User Agent in Footer: </label>
|
||||
<input type="checkbox" name="show_user_agent"
|
||||
id="config-show-user-agent" {{ 'checked' if config.show_user_agent else '' }}>
|
||||
</div>
|
||||
<div class="config-div config-div-root-url">
|
||||
<label for="config-url">{{ translation['config-url'] }}: </label>
|
||||
<input type="text" name="url" id="config-url" value="{{ config.url }}">
|
||||
|
|
|
|||
|
|
@ -36,18 +36,14 @@ def fetch_favicon(url: str) -> bytes:
|
|||
bytes - the favicon bytes, or a placeholder image if one
|
||||
was not returned
|
||||
"""
|
||||
try:
|
||||
response = httpx.get(f'{ddg_favicon_site}/{urlparse(url).netloc}.ico', timeout=2.0)
|
||||
response = httpx.get(f'{ddg_favicon_site}/{urlparse(url).netloc}.ico')
|
||||
|
||||
if response.status_code == 200 and len(response.content) > 0:
|
||||
tmp_mem = io.BytesIO()
|
||||
tmp_mem.write(response.content)
|
||||
tmp_mem.seek(0)
|
||||
if response.status_code == 200 and len(response.content) > 0:
|
||||
tmp_mem = io.BytesIO()
|
||||
tmp_mem.write(response.content)
|
||||
tmp_mem.seek(0)
|
||||
|
||||
return tmp_mem.read()
|
||||
except Exception:
|
||||
# If favicon fetch fails, return placeholder
|
||||
pass
|
||||
return tmp_mem.read()
|
||||
return placeholder_img
|
||||
|
||||
|
||||
|
|
|
|||
336
app/utils/ua_generator.py
Normal file
336
app/utils/ua_generator.py
Normal file
|
|
@ -0,0 +1,336 @@
|
|||
"""
|
||||
User Agent Generator for Opera-based UA strings.
|
||||
|
||||
This module generates realistic Opera User Agent strings based on patterns
|
||||
found in working UA strings that successfully bypass Google's restrictions.
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
import random
|
||||
from datetime import datetime, timedelta
|
||||
from typing import List, Dict
|
||||
|
||||
|
||||
# Default fallback UA if generation fails
|
||||
DEFAULT_FALLBACK_UA = "Opera/9.80 (iPad; Opera Mini/5.0.17381/503; U; eu) Presto/2.6.35 Version/11.10)"
|
||||
|
||||
# Opera UA Pattern Templates
|
||||
OPERA_PATTERNS = [
|
||||
# Opera Mini (J2ME/MIDP)
|
||||
"Opera/9.80 (J2ME/MIDP; Opera Mini/{version}/{build}; U; {lang}) Presto/{presto} Version/{final}",
|
||||
|
||||
# Opera Mobile (Android)
|
||||
"Opera/9.80 (Android; Linux; Opera Mobi/{build}; U; {lang}) Presto/{presto} Version/{final}",
|
||||
|
||||
# Opera Mobile (iPhone)
|
||||
"Opera/9.80 (iPhone; Opera Mini/{version}/{build}; U; {lang}) Presto/{presto} Version/{final}",
|
||||
|
||||
# Opera Mobile (iPad)
|
||||
"Opera/9.80 (iPad; Opera Mini/{version}/{build}; U; {lang}) Presto/{presto} Version/{final}",
|
||||
]
|
||||
|
||||
# Randomization pools based on working UAs
|
||||
OPERA_MINI_VERSIONS = [
|
||||
"4.0", "4.1.11321", "4.1.12965", "4.1.13573", "4.1.13907", "4.1.14287",
|
||||
"4.1.15082", "4.2.13057", "4.2.13221", "4.2.13265", "4.2.13337",
|
||||
"4.2.13400", "4.2.13918", "4.2.13943", "4.2.14320", "4.2.14409",
|
||||
"4.2.14753", "4.2.14881", "4.2.14885", "4.2.14912", "4.2.15066",
|
||||
"4.2.15410", "4.2.16007", "4.2.16320", "4.2.18887", "4.2.19634",
|
||||
"4.2.21465", "4.2.22228", "4.2.23453", "4.2.24721", "4.3.13337",
|
||||
"4.3.24214", "4.4.26736", "4.4.29476", "4.5.33867", "4.5.40312",
|
||||
"5.0.15650", "5.0.16823", "5.0.17381", "5.0.17443", "5.0.18635",
|
||||
"5.0.18741", "5.0.19683", "5.0.19693", "5.0.20873", "5.0.22349",
|
||||
"5.1.21051", "5.1.21126", "5.1.21214", "5.1.21415", "5.1.21594",
|
||||
"5.1.21595", "5.1.22296", "5.1.22303", "5.1.22396", "5.1.22460",
|
||||
"5.1.22783", "5.1.22784", "6.0.24095", "6.0.24212", "6.0.24455",
|
||||
"6.1.25375", "6.1.25378", "6.1.25759", "6.24093", "6.24096",
|
||||
"6.24209", "6.24288", "6.5.26955", "6.5.29702", "7.0.29952",
|
||||
"7.1.32052", "7.1.32444", "7.1.32694", "7.29530", "7.5.33361",
|
||||
"7.6.35766", "9.80", "36.2.2254"
|
||||
]
|
||||
|
||||
OPERA_MOBI_BUILDS = [
|
||||
"27", "49", "447", "498", "1181", "1209", "3730",
|
||||
"ADR-1011151731", "ADR-1012211514", "ADR-1012221546", "ADR-1012272315",
|
||||
"SYB-1103211396", "SYB-1104061449", "SYB-1107071606",
|
||||
"ADR-1111101157"
|
||||
]
|
||||
|
||||
BUILD_NUMBERS = [
|
||||
"18.678", "18.684", "18.738", "18.794", "19.892", "19.916",
|
||||
"20.2477", "20.2479", "20.2485", "20.2489", "21.529", "22.387",
|
||||
"22.394", "22.401", "22.414", "22.453", "22.478", "23.317",
|
||||
"23.333", "23.334", "23.377", "23.390", "24.741", "24.743",
|
||||
"24.746", "24.783", "24.838", "24.871", "24.899", "25.657",
|
||||
"25.677", "25.729", "25.872", "26.1305", "27.1366", "27.1407",
|
||||
"27.1573", "28.2075", "28.2555", "28.2647", "28.2766", "29.3594",
|
||||
"30.3316", "31.1350", "35.2883", "35.5706", "37.6584", "119.132",
|
||||
"170.51", "170.54", "764", "870", "886", "490", "503"
|
||||
]
|
||||
|
||||
PRESTO_VERSIONS = [
|
||||
"2.2.0", "2.4.15", "2.4.154.15", "2.4.18", "2.5.25", "2.5.28",
|
||||
"2.6.35", "2.7.60", "2.7.81", "2.8.119", "2.8.149", "2.8.191",
|
||||
"2.9.201", "2.12.423"
|
||||
]
|
||||
|
||||
FINAL_VERSIONS = [
|
||||
"10.00", "10.1", "10.5", "10.54", "10.5454", "11.00", "11.10",
|
||||
"12.02", "12.16", "13.00"
|
||||
]
|
||||
|
||||
LANGUAGES = [
|
||||
# English variants
|
||||
"en", "en-US", "en-GB", "en-CA", "en-AU", "en-NZ", "en-ZA", "en-IN", "en-SG",
|
||||
# Western European
|
||||
"de", "de-DE", "de-AT", "de-CH",
|
||||
"fr", "fr-FR", "fr-CA", "fr-BE", "fr-CH", "fr-LU",
|
||||
"es", "es-ES", "es-MX", "es-AR", "es-CO", "es-CL", "es-PE", "es-VE", "es-LA",
|
||||
"it", "it-IT", "it-CH",
|
||||
"pt", "pt-PT", "pt-BR",
|
||||
"nl", "nl-NL", "nl-BE",
|
||||
# Nordic languages
|
||||
"da", "da-DK",
|
||||
"sv", "sv-SE",
|
||||
"no", "no-NO", "nb", "nn",
|
||||
"fi", "fi-FI",
|
||||
"is", "is-IS",
|
||||
# Eastern European
|
||||
"pl", "pl-PL",
|
||||
"cs", "cs-CZ",
|
||||
"sk", "sk-SK",
|
||||
"hu", "hu-HU",
|
||||
"ro", "ro-RO",
|
||||
"bg", "bg-BG",
|
||||
"hr", "hr-HR",
|
||||
"sr", "sr-RS",
|
||||
"sl", "sl-SI",
|
||||
"uk", "uk-UA",
|
||||
"ru", "ru-RU",
|
||||
# Asian languages
|
||||
"zh", "zh-CN", "zh-TW", "zh-HK",
|
||||
"ja", "ja-JP",
|
||||
"ko", "ko-KR",
|
||||
"th", "th-TH",
|
||||
"vi", "vi-VN",
|
||||
"id", "id-ID",
|
||||
"ms", "ms-MY",
|
||||
"fil", "tl",
|
||||
# Middle Eastern
|
||||
"tr", "tr-TR",
|
||||
"ar", "ar-SA", "ar-AE", "ar-EG",
|
||||
"he", "he-IL",
|
||||
"fa", "fa-IR",
|
||||
# Other
|
||||
"hi", "hi-IN",
|
||||
"bn", "bn-IN",
|
||||
"ta", "ta-IN",
|
||||
"te", "te-IN",
|
||||
"mr", "mr-IN",
|
||||
"el", "el-GR",
|
||||
"ca", "ca-ES",
|
||||
"eu", "eu-ES"
|
||||
]
|
||||
|
||||
|
||||
|
||||
def generate_opera_ua() -> str:
|
||||
"""
|
||||
Generate a single random Opera User Agent string.
|
||||
|
||||
Returns:
|
||||
str: A randomly generated Opera UA string
|
||||
"""
|
||||
pattern = random.choice(OPERA_PATTERNS)
|
||||
|
||||
# Determine which parameters to use based on the pattern
|
||||
params = {
|
||||
'lang': random.choice(LANGUAGES)
|
||||
}
|
||||
|
||||
if '{version}' in pattern:
|
||||
params['version'] = random.choice(OPERA_MINI_VERSIONS)
|
||||
|
||||
if '{build}' in pattern:
|
||||
# Use MOBI build for "Opera Mobi", regular build for "Opera Mini"
|
||||
if "Opera Mobi" in pattern:
|
||||
params['build'] = random.choice(OPERA_MOBI_BUILDS)
|
||||
else:
|
||||
params['build'] = random.choice(BUILD_NUMBERS)
|
||||
|
||||
if '{presto}' in pattern:
|
||||
params['presto'] = random.choice(PRESTO_VERSIONS)
|
||||
|
||||
if '{final}' in pattern:
|
||||
params['final'] = random.choice(FINAL_VERSIONS)
|
||||
|
||||
return pattern.format(**params)
|
||||
|
||||
|
||||
def generate_ua_pool(count: int = 10) -> List[str]:
|
||||
"""
|
||||
Generate a pool of unique Opera User Agent strings.
|
||||
|
||||
Args:
|
||||
count: Number of UA strings to generate (default: 10)
|
||||
|
||||
Returns:
|
||||
List[str]: List of unique UA strings
|
||||
"""
|
||||
ua_pool = set()
|
||||
|
||||
# Keep generating until we have enough unique UAs
|
||||
# Add safety limit to prevent infinite loop
|
||||
max_attempts = count * 100
|
||||
attempts = 0
|
||||
|
||||
try:
|
||||
while len(ua_pool) < count and attempts < max_attempts:
|
||||
ua = generate_opera_ua()
|
||||
ua_pool.add(ua)
|
||||
attempts += 1
|
||||
except Exception:
|
||||
# If generation fails entirely, return at least the default fallback
|
||||
if not ua_pool:
|
||||
return [DEFAULT_FALLBACK_UA]
|
||||
|
||||
# If we couldn't generate enough, fill remaining with default
|
||||
result = list(ua_pool)
|
||||
while len(result) < count:
|
||||
result.append(DEFAULT_FALLBACK_UA)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def save_ua_pool(uas: List[str], cache_path: str) -> None:
|
||||
"""
|
||||
Save UA pool to cache file.
|
||||
|
||||
Args:
|
||||
uas: List of UA strings to save
|
||||
cache_path: Path to cache file
|
||||
"""
|
||||
cache_data = {
|
||||
'generated_at': datetime.now().isoformat(),
|
||||
'user_agents': uas
|
||||
}
|
||||
|
||||
# Ensure directory exists
|
||||
cache_dir = os.path.dirname(cache_path)
|
||||
if cache_dir and not os.path.exists(cache_dir):
|
||||
os.makedirs(cache_dir, exist_ok=True)
|
||||
|
||||
with open(cache_path, 'w', encoding='utf-8') as f:
|
||||
json.dump(cache_data, f, indent=2)
|
||||
|
||||
|
||||
def load_custom_ua_list(file_path: str) -> List[str]:
|
||||
"""
|
||||
Load custom UA list from a text file.
|
||||
|
||||
Args:
|
||||
file_path: Path to text file containing UA strings (one per line)
|
||||
|
||||
Returns:
|
||||
List[str]: List of UA strings, or empty list if file is invalid
|
||||
"""
|
||||
try:
|
||||
with open(file_path, 'r', encoding='utf-8') as f:
|
||||
uas = [line.strip() for line in f if line.strip()]
|
||||
|
||||
# Validate that we have at least one UA
|
||||
if not uas:
|
||||
return []
|
||||
|
||||
return uas
|
||||
except (FileNotFoundError, PermissionError, UnicodeDecodeError):
|
||||
return []
|
||||
|
||||
|
||||
def load_ua_pool(cache_path: str, count: int = 10) -> List[str]:
|
||||
"""
|
||||
Load UA pool from custom list file, cache, or generate new one.
|
||||
|
||||
Priority order:
|
||||
1. Custom UA list file (if WHOOGLE_UA_LIST_FILE is set)
|
||||
2. Cached auto-generated UAs
|
||||
3. Newly generated UAs
|
||||
|
||||
Args:
|
||||
cache_path: Path to cache file
|
||||
count: Number of UAs to generate if cache is invalid (default: 10)
|
||||
|
||||
Returns:
|
||||
List[str]: List of UA strings
|
||||
"""
|
||||
# Check for custom UA list file first (highest priority)
|
||||
custom_ua_file = os.environ.get('WHOOGLE_UA_LIST_FILE', '').strip()
|
||||
if custom_ua_file:
|
||||
custom_uas = load_custom_ua_list(custom_ua_file)
|
||||
if custom_uas:
|
||||
# Custom list loaded successfully
|
||||
return custom_uas
|
||||
else:
|
||||
# Custom file specified but invalid, log warning and fall back
|
||||
print(f"Warning: Custom UA list file '{custom_ua_file}' not found or invalid, falling back to auto-generated UAs")
|
||||
|
||||
# Check if we should use cache
|
||||
use_cache = os.environ.get('WHOOGLE_UA_CACHE_PERSISTENT', '1') == '1'
|
||||
refresh_days = int(os.environ.get('WHOOGLE_UA_CACHE_REFRESH_DAYS', '0'))
|
||||
|
||||
# If cache disabled, always generate new
|
||||
if not use_cache:
|
||||
uas = generate_ua_pool(count)
|
||||
save_ua_pool(uas, cache_path)
|
||||
return uas
|
||||
|
||||
# Try to load from cache
|
||||
if os.path.exists(cache_path):
|
||||
try:
|
||||
with open(cache_path, 'r', encoding='utf-8') as f:
|
||||
cache_data = json.load(f)
|
||||
|
||||
# Check if cache is expired (if refresh_days > 0)
|
||||
if refresh_days > 0:
|
||||
generated_at = datetime.fromisoformat(cache_data['generated_at'])
|
||||
age_days = (datetime.now() - generated_at).days
|
||||
|
||||
if age_days >= refresh_days:
|
||||
# Cache expired, generate new
|
||||
uas = generate_ua_pool(count)
|
||||
save_ua_pool(uas, cache_path)
|
||||
return uas
|
||||
|
||||
# Cache is valid, return it
|
||||
return cache_data['user_agents']
|
||||
except (json.JSONDecodeError, KeyError, ValueError):
|
||||
# Cache file is corrupted, generate new
|
||||
pass
|
||||
|
||||
# No valid cache, generate new
|
||||
uas = generate_ua_pool(count)
|
||||
save_ua_pool(uas, cache_path)
|
||||
return uas
|
||||
|
||||
|
||||
def get_random_ua(ua_pool: List[str]) -> str:
|
||||
"""
|
||||
Get a random UA from the pool.
|
||||
|
||||
Args:
|
||||
ua_pool: List of UA strings
|
||||
|
||||
Returns:
|
||||
str: Random UA string from the pool
|
||||
"""
|
||||
if not ua_pool:
|
||||
# Fallback to generating one if pool is empty
|
||||
try:
|
||||
return generate_opera_ua()
|
||||
except Exception:
|
||||
# If generation fails, use default fallback
|
||||
return DEFAULT_FALLBACK_UA
|
||||
|
||||
return random.choice(ua_pool)
|
||||
|
||||
|
|
@ -4,4 +4,5 @@ optional_dev_tag = ''
|
|||
if os.getenv('DEV_BUILD'):
|
||||
optional_dev_tag = '.dev' + os.getenv('DEV_BUILD')
|
||||
|
||||
__version__ = '1.1.0' + optional_dev_tag
|
||||
__version__ = '1.1.1' + optional_dev_tag
|
||||
|
||||
|
|
|
|||
363
misc/check_google_user_agents.py
Executable file
363
misc/check_google_user_agents.py
Executable file
|
|
@ -0,0 +1,363 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
Test User Agent strings against Google to find which ones return actual search results
|
||||
instead of JavaScript pages or upgrade browser messages.
|
||||
|
||||
Usage:
|
||||
python test_google_user_agents.py <user_agent_file> [--output <output_file>] [--query <search_query>]
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import random
|
||||
import sys
|
||||
import time
|
||||
from typing import List, Tuple
|
||||
import requests
|
||||
|
||||
# Common search queries to cycle through for more realistic testing
|
||||
DEFAULT_SEARCH_QUERIES = [
|
||||
"python programming",
|
||||
"weather today",
|
||||
"news",
|
||||
"how to cook pasta",
|
||||
"best movies 2025",
|
||||
"restaurants near me",
|
||||
"translate hello",
|
||||
"calculator",
|
||||
"time",
|
||||
"maps",
|
||||
"images",
|
||||
"videos",
|
||||
"shopping",
|
||||
"travel",
|
||||
"sports scores",
|
||||
"stock market",
|
||||
"recipes",
|
||||
"music",
|
||||
"books",
|
||||
"technology",
|
||||
"AI",
|
||||
"AI programming",
|
||||
"Why does google hate users?"
|
||||
]
|
||||
|
||||
# Markers that indicate blocked/JS pages
|
||||
BLOCK_MARKERS = [
|
||||
"unusual traffic",
|
||||
"sorry but your computer",
|
||||
"solve the captcha",
|
||||
"request looks automated",
|
||||
"g-recaptcha",
|
||||
"upgrade your browser",
|
||||
"browser is not supported",
|
||||
"please upgrade",
|
||||
"isn't supported",
|
||||
"isn\"t supported", # With escaped quote
|
||||
"upgrade to a recent version",
|
||||
"update your browser",
|
||||
"your browser isn't supported",
|
||||
]
|
||||
|
||||
# Markers that indicate actual search results
|
||||
SUCCESS_MARKERS = [
|
||||
'<div class="g"', # Google search result container
|
||||
'<div id="search"', # Search results container
|
||||
'<div class="rc"', # Result container
|
||||
'class="yuRUbf"', # Result link container
|
||||
'class="LC20lb"', # Result title
|
||||
'- Google Search</title>', # Page title indicator
|
||||
'id="rso"', # Results container
|
||||
'class="g"', # Result class (without div tag)
|
||||
]
|
||||
|
||||
|
||||
def read_user_agents(file_path: str) -> List[str]:
|
||||
"""Read user agent strings from a file, one per line."""
|
||||
try:
|
||||
with open(file_path, 'r', encoding='utf-8') as f:
|
||||
user_agents = [line.strip() for line in f if line.strip()]
|
||||
return user_agents
|
||||
except FileNotFoundError:
|
||||
print(f"Error: File '{file_path}' not found.", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
except Exception as e:
|
||||
print(f"Error reading file: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
def test_user_agent(user_agent: str, query: str = "test", timeout: float = 10.0) -> Tuple[bool, str]:
|
||||
"""
|
||||
Test a user agent against Google search.
|
||||
|
||||
Returns:
|
||||
Tuple of (is_working: bool, reason: str)
|
||||
"""
|
||||
url = "https://www.google.com/search"
|
||||
params = {"q": query, "gbv": "1", "num": "10"}
|
||||
|
||||
headers = {
|
||||
"User-Agent": user_agent,
|
||||
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
|
||||
"Accept-Language": "en-US,en;q=0.9",
|
||||
"Accept-Encoding": "gzip, deflate, br",
|
||||
"Connection": "keep-alive",
|
||||
"Upgrade-Insecure-Requests": "1",
|
||||
}
|
||||
|
||||
try:
|
||||
response = requests.get(url, params=params, headers=headers, timeout=timeout)
|
||||
|
||||
# Check HTTP status
|
||||
if response.status_code == 429:
|
||||
# Rate limited - raise this so we can handle it specially
|
||||
raise Exception(f"Rate limited (429)")
|
||||
if response.status_code >= 500:
|
||||
return False, f"Server error ({response.status_code})"
|
||||
if response.status_code == 403:
|
||||
return False, f"Blocked ({response.status_code})"
|
||||
if response.status_code >= 400:
|
||||
return False, f"HTTP {response.status_code}"
|
||||
|
||||
body_lower = response.text.lower()
|
||||
|
||||
# Check for block markers
|
||||
for marker in BLOCK_MARKERS:
|
||||
if marker.lower() in body_lower:
|
||||
return False, f"Blocked: {marker}"
|
||||
|
||||
# Check for redirect indicators first - these indicate non-working responses
|
||||
has_redirect = ("window.location" in body_lower or "location.href" in body_lower) and "google.com" not in body_lower
|
||||
if has_redirect:
|
||||
return False, "JavaScript redirect detected"
|
||||
|
||||
# Check for noscript redirect (another indicator of JS-only page)
|
||||
if 'noscript' in body_lower and 'http-equiv="refresh"' in body_lower:
|
||||
return False, "NoScript redirect page"
|
||||
|
||||
# Check for success markers (actual search results)
|
||||
# We need at least one strong indicator of search results
|
||||
has_results = any(marker in response.text for marker in SUCCESS_MARKERS)
|
||||
|
||||
if has_results:
|
||||
return True, "OK - Has search results"
|
||||
else:
|
||||
# Check for very short responses (likely error pages)
|
||||
if len(response.text) < 1000:
|
||||
return False, "Response too short (likely error page)"
|
||||
# If we don't have success markers, it's not a working response
|
||||
# Even if it's substantial and doesn't have block markers, it might be a JS-only page
|
||||
return False, "No search results found"
|
||||
|
||||
except requests.Timeout:
|
||||
return False, "Request timeout"
|
||||
except requests.HTTPError as e:
|
||||
if e.response and e.response.status_code == 429:
|
||||
# Rate limited - raise this so we can handle it specially
|
||||
raise Exception(f"Rate limited (429) - {str(e)}")
|
||||
return False, f"HTTP error: {str(e)}"
|
||||
except requests.RequestException as e:
|
||||
# Check if it's a 429 in the response
|
||||
if hasattr(e, 'response') and e.response and e.response.status_code == 429:
|
||||
raise Exception(f"Rate limited (429) - {str(e)}")
|
||||
return False, f"Request error: {str(e)}"
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Test User Agent strings against Google to find working ones.",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
python test_google_user_agents.py UAs.txt
|
||||
python test_google_user_agents.py UAs.txt --output working_uas.txt
|
||||
python test_google_user_agents.py UAs.txt --query "python programming"
|
||||
"""
|
||||
)
|
||||
parser.add_argument(
|
||||
"user_agent_file",
|
||||
help="Path to file containing user agent strings (one per line)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--output", "-o",
|
||||
help="Output file to write working user agents (default: stdout)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--query", "-q",
|
||||
default=None,
|
||||
help="Search query to use for testing (default: cycles through random queries)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--random-queries", "-r",
|
||||
action="store_true",
|
||||
help="Use random queries from a predefined list (default: True if --query not specified)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--timeout", "-t",
|
||||
type=float,
|
||||
default=10.0,
|
||||
help="Request timeout in seconds (default: 10.0)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--delay", "-d",
|
||||
type=float,
|
||||
default=0.5,
|
||||
help="Delay between requests in seconds (default: 0.5)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--verbose", "-v",
|
||||
action="store_true",
|
||||
help="Show detailed results for each user agent"
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Determine query strategy
|
||||
use_random_queries = args.random_queries or (args.query is None)
|
||||
if use_random_queries:
|
||||
search_queries = DEFAULT_SEARCH_QUERIES.copy()
|
||||
random.shuffle(search_queries) # Shuffle for variety
|
||||
current_query_idx = 0
|
||||
query_display = f"cycling through {len(search_queries)} random queries"
|
||||
else:
|
||||
search_queries = [args.query]
|
||||
query_display = f"'{args.query}'"
|
||||
|
||||
# Read user agents
|
||||
user_agents = read_user_agents(args.user_agent_file)
|
||||
if not user_agents:
|
||||
print("No user agents found in file.", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
print(f"Testing {len(user_agents)} user agents against Google...", file=sys.stderr)
|
||||
print(f"Query: {query_display}", file=sys.stderr)
|
||||
if args.output:
|
||||
print(f"Output file: {args.output} (appending results incrementally)", file=sys.stderr)
|
||||
print(file=sys.stderr)
|
||||
|
||||
# Load existing working user agents from output file to avoid duplicates
|
||||
existing_working = set()
|
||||
if args.output:
|
||||
try:
|
||||
with open(args.output, 'r', encoding='utf-8') as f:
|
||||
existing_working = {line.strip() for line in f if line.strip()}
|
||||
if existing_working:
|
||||
print(f"Found {len(existing_working)} existing user agents in output file", file=sys.stderr)
|
||||
except FileNotFoundError:
|
||||
# File doesn't exist yet, that's fine
|
||||
pass
|
||||
except Exception as e:
|
||||
print(f"Warning: Could not read existing output file: {e}", file=sys.stderr)
|
||||
|
||||
# Open output file for incremental writing if specified (append mode)
|
||||
output_file = None
|
||||
if args.output:
|
||||
try:
|
||||
output_file = open(args.output, 'a', encoding='utf-8')
|
||||
except Exception as e:
|
||||
print(f"Error opening output file: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
working_agents = []
|
||||
failed_count = 0
|
||||
skipped_count = 0
|
||||
last_successful_idx = 0
|
||||
|
||||
try:
|
||||
for idx, ua in enumerate(user_agents, 1):
|
||||
# Skip testing if this UA is already in the working file
|
||||
if args.output and ua in existing_working:
|
||||
skipped_count += 1
|
||||
if args.verbose:
|
||||
print(f"[{idx}/{len(user_agents)}] ⊘ SKIPPED - Already in working file", file=sys.stderr)
|
||||
last_successful_idx = idx
|
||||
continue
|
||||
|
||||
try:
|
||||
# Get the next query (cycle through if using random queries)
|
||||
if use_random_queries:
|
||||
query = search_queries[current_query_idx % len(search_queries)]
|
||||
current_query_idx += 1
|
||||
else:
|
||||
query = args.query
|
||||
|
||||
is_working, reason = test_user_agent(ua, query, args.timeout)
|
||||
|
||||
if is_working:
|
||||
working_agents.append(ua)
|
||||
status = "✓"
|
||||
# Write immediately to output file if specified (skip if duplicate)
|
||||
if output_file:
|
||||
if ua not in existing_working:
|
||||
output_file.write(ua + '\n')
|
||||
output_file.flush() # Ensure it's written to disk
|
||||
existing_working.add(ua) # Track it to avoid duplicates
|
||||
else:
|
||||
if args.verbose:
|
||||
print(f"[{idx}/{len(user_agents)}] {status} WORKING (duplicate, skipped) - {reason}", file=sys.stderr)
|
||||
# Also print to stdout if no output file
|
||||
if not args.output:
|
||||
print(ua)
|
||||
|
||||
if args.verbose:
|
||||
print(f"[{idx}/{len(user_agents)}] {status} WORKING - {reason}", file=sys.stderr)
|
||||
else:
|
||||
failed_count += 1
|
||||
status = "✗"
|
||||
if args.verbose:
|
||||
print(f"[{idx}/{len(user_agents)}] {status} FAILED - {reason}", file=sys.stderr)
|
||||
|
||||
last_successful_idx = idx
|
||||
|
||||
# Progress indicator for non-verbose mode
|
||||
if not args.verbose and idx % 10 == 0:
|
||||
print(f"Progress: {idx}/{len(user_agents)} tested ({len(working_agents)} working, {failed_count} failed)", file=sys.stderr)
|
||||
|
||||
# Delay between requests to avoid rate limiting
|
||||
if idx < len(user_agents):
|
||||
time.sleep(args.delay)
|
||||
|
||||
except KeyboardInterrupt:
|
||||
print(file=sys.stderr)
|
||||
print(f"\nInterrupted by user at index {idx}/{len(user_agents)}", file=sys.stderr)
|
||||
print(f"Last successful test: {last_successful_idx}/{len(user_agents)}", file=sys.stderr)
|
||||
break
|
||||
except Exception as e:
|
||||
# Handle unexpected errors (like network issues or rate limits)
|
||||
error_msg = str(e)
|
||||
if "429" in error_msg or "Rate limited" in error_msg:
|
||||
print(file=sys.stderr)
|
||||
print(f"\n⚠️ RATE LIMIT DETECTED at index {idx}/{len(user_agents)}", file=sys.stderr)
|
||||
print(f"Last successful test: {last_successful_idx}/{len(user_agents)}", file=sys.stderr)
|
||||
print(f"Working user agents found so far: {len(working_agents)}", file=sys.stderr)
|
||||
if args.output:
|
||||
print(f"Results saved to: {args.output}", file=sys.stderr)
|
||||
print(f"\nTo resume later, you can skip the first {last_successful_idx} user agents.", file=sys.stderr)
|
||||
raise # Re-raise to exit the loop
|
||||
else:
|
||||
print(f"[{idx}/{len(user_agents)}] ERROR - {error_msg}", file=sys.stderr)
|
||||
failed_count += 1
|
||||
last_successful_idx = idx
|
||||
if idx < len(user_agents):
|
||||
time.sleep(args.delay)
|
||||
continue
|
||||
|
||||
finally:
|
||||
# Close output file if opened
|
||||
if output_file:
|
||||
output_file.close()
|
||||
|
||||
# Summary
|
||||
print(file=sys.stderr)
|
||||
tested_count = last_successful_idx - skipped_count
|
||||
print(f"Summary: {len(working_agents)} working, {failed_count} failed, {skipped_count} skipped out of {last_successful_idx} processed (of {len(user_agents)} total)", file=sys.stderr)
|
||||
if last_successful_idx < len(user_agents):
|
||||
print(f"Note: Processing stopped at index {last_successful_idx}. {len(user_agents) - last_successful_idx} user agents not processed.", file=sys.stderr)
|
||||
if args.output:
|
||||
print(f"Results saved to: {args.output}", file=sys.stderr)
|
||||
|
||||
return 0 if working_agents else 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
|
||||
198
misc/generate_uas.py
Executable file
198
misc/generate_uas.py
Executable file
|
|
@ -0,0 +1,198 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
Standalone Opera User Agent String Generator
|
||||
|
||||
This tool generates Opera-based User Agent strings that can be used with Whoogle.
|
||||
It can be run independently to generate and display UA strings on demand.
|
||||
|
||||
Usage:
|
||||
python misc/generate_uas.py [count]
|
||||
|
||||
Arguments:
|
||||
count: Number of UA strings to generate (default: 10)
|
||||
|
||||
Examples:
|
||||
python misc/generate_uas.py # Generate 10 UAs
|
||||
python misc/generate_uas.py 20 # Generate 20 UAs
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
|
||||
# Default fallback UA if generation fails
|
||||
DEFAULT_FALLBACK_UA = "Opera/9.30 (Nintendo Wii; U; ; 3642; en)"
|
||||
|
||||
# Try to import from the app module if available
|
||||
try:
|
||||
# Add parent directory to path to allow imports
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..'))
|
||||
from app.utils.ua_generator import generate_ua_pool
|
||||
USE_APP_MODULE = True
|
||||
except ImportError:
|
||||
USE_APP_MODULE = False
|
||||
# Self-contained version if app module is not available
|
||||
import random
|
||||
|
||||
# Opera UA Pattern Templates
|
||||
OPERA_PATTERNS = [
|
||||
"Opera/9.80 (J2ME/MIDP; Opera Mini/{version}/{build}; U; {lang}) Presto/{presto} Version/{final}",
|
||||
"Opera/9.80 (Android; Linux; Opera Mobi/{build}; U; {lang}) Presto/{presto} Version/{final}",
|
||||
"Opera/9.80 (iPhone; Opera Mini/{version}/{build}; U; {lang}) Presto/{presto} Version/{final}",
|
||||
"Opera/9.80 (iPad; Opera Mini/{version}/{build}; U; {lang}) Presto/{presto} Version/{final}",
|
||||
]
|
||||
|
||||
OPERA_MINI_VERSIONS = [
|
||||
"4.0", "4.1.11321", "4.2.13337", "4.2.14912", "4.2.15410", "4.3.24214",
|
||||
"5.0.18741", "5.1.22296", "5.1.22783", "6.0.24095", "6.24093", "7.1.32444",
|
||||
"7.6.35766", "36.2.2254"
|
||||
]
|
||||
|
||||
OPERA_MOBI_BUILDS = [
|
||||
"27", "49", "447", "1209", "3730", "ADR-1012221546", "SYB-1107071606"
|
||||
]
|
||||
|
||||
BUILD_NUMBERS = [
|
||||
"22.387", "22.478", "23.334", "23.377", "24.746", "24.783", "25.657",
|
||||
"27.1407", "28.2647", "35.5706", "119.132", "870", "886"
|
||||
]
|
||||
|
||||
PRESTO_VERSIONS = [
|
||||
"2.4.15", "2.4.18", "2.5.25", "2.8.119", "2.12.423"
|
||||
]
|
||||
|
||||
FINAL_VERSIONS = [
|
||||
"10.00", "10.1", "10.54", "11.10", "12.16", "13.00"
|
||||
]
|
||||
|
||||
LANGUAGES = [
|
||||
# English variants
|
||||
"en", "en-US", "en-GB", "en-CA", "en-AU", "en-NZ", "en-ZA", "en-IN", "en-SG",
|
||||
# Western European
|
||||
"de", "de-DE", "de-AT", "de-CH",
|
||||
"fr", "fr-FR", "fr-CA", "fr-BE", "fr-CH", "fr-LU",
|
||||
"es", "es-ES", "es-MX", "es-AR", "es-CO", "es-CL", "es-PE", "es-VE", "es-LA",
|
||||
"it", "it-IT", "it-CH",
|
||||
"pt", "pt-PT", "pt-BR",
|
||||
"nl", "nl-NL", "nl-BE",
|
||||
# Nordic languages
|
||||
"da", "da-DK",
|
||||
"sv", "sv-SE",
|
||||
"no", "no-NO", "nb", "nn",
|
||||
"fi", "fi-FI",
|
||||
"is", "is-IS",
|
||||
# Eastern European
|
||||
"pl", "pl-PL",
|
||||
"cs", "cs-CZ",
|
||||
"sk", "sk-SK",
|
||||
"hu", "hu-HU",
|
||||
"ro", "ro-RO",
|
||||
"bg", "bg-BG",
|
||||
"hr", "hr-HR",
|
||||
"sr", "sr-RS",
|
||||
"sl", "sl-SI",
|
||||
"uk", "uk-UA",
|
||||
"ru", "ru-RU",
|
||||
# Asian languages
|
||||
"zh", "zh-CN", "zh-TW", "zh-HK",
|
||||
"ja", "ja-JP",
|
||||
"ko", "ko-KR",
|
||||
"th", "th-TH",
|
||||
"vi", "vi-VN",
|
||||
"id", "id-ID",
|
||||
"ms", "ms-MY",
|
||||
"fil", "tl",
|
||||
# Middle Eastern
|
||||
"tr", "tr-TR",
|
||||
"ar", "ar-SA", "ar-AE", "ar-EG",
|
||||
"he", "he-IL",
|
||||
"fa", "fa-IR",
|
||||
# Other
|
||||
"hi", "hi-IN",
|
||||
"bn", "bn-IN",
|
||||
"ta", "ta-IN",
|
||||
"te", "te-IN",
|
||||
"mr", "mr-IN",
|
||||
"el", "el-GR",
|
||||
"ca", "ca-ES",
|
||||
"eu", "eu-ES"
|
||||
]
|
||||
|
||||
def generate_opera_ua():
|
||||
"""Generate a single random Opera User Agent string."""
|
||||
pattern = random.choice(OPERA_PATTERNS)
|
||||
params = {'lang': random.choice(LANGUAGES)}
|
||||
|
||||
if '{version}' in pattern:
|
||||
params['version'] = random.choice(OPERA_MINI_VERSIONS)
|
||||
if '{build}' in pattern:
|
||||
if "Opera Mobi" in pattern:
|
||||
params['build'] = random.choice(OPERA_MOBI_BUILDS)
|
||||
else:
|
||||
params['build'] = random.choice(BUILD_NUMBERS)
|
||||
if '{presto}' in pattern:
|
||||
params['presto'] = random.choice(PRESTO_VERSIONS)
|
||||
if '{final}' in pattern:
|
||||
params['final'] = random.choice(FINAL_VERSIONS)
|
||||
|
||||
return pattern.format(**params)
|
||||
|
||||
def generate_ua_pool(count=10):
|
||||
"""Generate a pool of unique Opera User Agent strings."""
|
||||
ua_pool = set()
|
||||
max_attempts = count * 100
|
||||
attempts = 0
|
||||
|
||||
try:
|
||||
while len(ua_pool) < count and attempts < max_attempts:
|
||||
ua = generate_opera_ua()
|
||||
ua_pool.add(ua)
|
||||
attempts += 1
|
||||
except Exception:
|
||||
# If generation fails entirely, return at least the default fallback
|
||||
if not ua_pool:
|
||||
return [DEFAULT_FALLBACK_UA]
|
||||
|
||||
# If we couldn't generate enough, fill remaining with default
|
||||
result = list(ua_pool)
|
||||
while len(result) < count:
|
||||
result.append(DEFAULT_FALLBACK_UA)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def main():
|
||||
"""Main function to generate and display UA strings."""
|
||||
# Parse command line argument
|
||||
count = 10 # Default
|
||||
if len(sys.argv) > 1:
|
||||
try:
|
||||
count = int(sys.argv[1])
|
||||
if count < 1:
|
||||
print("Error: Count must be a positive integer", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
except ValueError:
|
||||
print(f"Error: Invalid count '{sys.argv[1]}'. Must be an integer.", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
# Show which mode we're using (to stderr so it doesn't interfere with output)
|
||||
if USE_APP_MODULE:
|
||||
print(f"# Using app.utils.ua_generator module", file=sys.stderr)
|
||||
else:
|
||||
print(f"# Using standalone generator (app module not available)", file=sys.stderr)
|
||||
|
||||
print(f"# Generating {count} Opera User Agent strings...\n", file=sys.stderr)
|
||||
|
||||
# Generate UAs
|
||||
uas = generate_ua_pool(count)
|
||||
|
||||
# Display them (one per line, no numbering)
|
||||
for ua in uas:
|
||||
print(ua)
|
||||
|
||||
# Summary to stderr so it doesn't interfere with piping
|
||||
print(f"\n# Generated {len(uas)} unique User Agent strings", file=sys.stderr)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
|
||||
|
|
@ -21,7 +21,7 @@ pycparser==2.22
|
|||
pyOpenSSL==19.1.0; platform_machine == 'armv7l'
|
||||
pyOpenSSL==25.3.0; platform_machine != 'armv7l'
|
||||
pyparsing==3.2.5
|
||||
pytest==7.2.1
|
||||
pytest==8.3.3
|
||||
python-dateutil==2.9.0.post0
|
||||
httpx[http2,socks]==0.28.1
|
||||
cachetools==6.2.0
|
||||
|
|
|
|||
|
|
@ -1,5 +1,8 @@
|
|||
from app import app
|
||||
from app.request import Request
|
||||
from app.utils.session import generate_key
|
||||
from test.mock_google import build_mock_response
|
||||
import httpx
|
||||
import pytest
|
||||
import random
|
||||
|
||||
|
|
@ -13,6 +16,38 @@ demo_config = {
|
|||
}
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def mock_google(monkeypatch):
|
||||
original_send = Request.send
|
||||
|
||||
def fake_send(self, base_url='', query='', attempt=0,
|
||||
force_mobile=False, user_agent=''):
|
||||
use_mock = not base_url or 'google.com/search' in base_url
|
||||
if not use_mock:
|
||||
return original_send(self, base_url, query, attempt,
|
||||
force_mobile, user_agent)
|
||||
|
||||
html = build_mock_response(query, getattr(self, 'language', ''), getattr(self, 'country', ''))
|
||||
request_url = (base_url or self.search_url) + query
|
||||
request = httpx.Request('GET', request_url)
|
||||
return httpx.Response(200, request=request, text=html)
|
||||
|
||||
def fake_autocomplete(self, q):
|
||||
normalized = q.replace('+', ' ').lower()
|
||||
suggestions = []
|
||||
if 'green eggs and' in normalized:
|
||||
suggestions.append('green eggs and ham')
|
||||
if 'the cat in the' in normalized:
|
||||
suggestions.append('the cat in the hat')
|
||||
if normalized.startswith('who'):
|
||||
suggestions.extend(['whoogle', 'whoogle search'])
|
||||
return suggestions
|
||||
|
||||
monkeypatch.setattr(Request, 'send', fake_send)
|
||||
monkeypatch.setattr(Request, 'autocomplete', fake_autocomplete)
|
||||
yield
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def client():
|
||||
with app.test_client() as client:
|
||||
|
|
|
|||
136
test/mock_google.py
Normal file
136
test/mock_google.py
Normal file
|
|
@ -0,0 +1,136 @@
|
|||
from urllib.parse import parse_qs, unquote, quote
|
||||
|
||||
from app.models.config import Config
|
||||
|
||||
DEFAULT_RESULTS = [
|
||||
('Example Domain', 'https://example.com/{slug}', 'Example information about {term}.'),
|
||||
('Whoogle Search', 'https://github.com/benbusby/whoogle-search', 'Private self-hosted Google proxy'),
|
||||
('Wikipedia', 'https://en.wikipedia.org/wiki/{title}', '{title} – encyclopedia entry.'),
|
||||
]
|
||||
|
||||
|
||||
def _result_block(title, href, snippet):
|
||||
encoded_href = quote(href, safe=':/')
|
||||
return (
|
||||
f'<div class="ZINbbc xpd O9g5cc uUPGi">'
|
||||
f'<div class="kCrYT">'
|
||||
f'<a href="/url?q={encoded_href}&sa=U&ved=2ahUKE">'
|
||||
f'<h3 class="BNeawe vvjwJb AP7Wnd">{title}</h3>'
|
||||
f'<span class="CVA68e">{title}</span>'
|
||||
f'</a>'
|
||||
f'<div class="VwiC3b">{snippet}</div>'
|
||||
f'</div>'
|
||||
f'</div>'
|
||||
)
|
||||
|
||||
|
||||
def _main_results(query, params, language='', country=''):
|
||||
term = query.lower()
|
||||
slug = query.replace(' ', '-')
|
||||
results = []
|
||||
|
||||
pref_lang = ''
|
||||
pref_country = ''
|
||||
if 'preferences' in params:
|
||||
try:
|
||||
pref_data = Config(**{})._decode_preferences(params['preferences'][0])
|
||||
pref_lang = str(pref_data.get('lang_interface', '') or '').lower()
|
||||
pref_country = str(pref_data.get('country', '') or '').lower()
|
||||
except Exception:
|
||||
pref_lang = pref_country = ''
|
||||
else:
|
||||
pref_lang = pref_country = ''
|
||||
|
||||
if 'wikipedia' in term:
|
||||
hl = str(params.get('hl', [''])[0] or '').lower()
|
||||
gl = str(params.get('gl', [''])[0] or '').lower()
|
||||
lr = str(params.get('lr', [''])[0] or '').lower()
|
||||
language_code = str(language or '').lower()
|
||||
country_code = str(country or '').lower()
|
||||
is_japanese = (
|
||||
hl.startswith('ja') or
|
||||
gl.startswith('jp') or
|
||||
lr.endswith('lang_ja') or
|
||||
language_code.endswith('lang_ja') or
|
||||
country_code.startswith('jp') or
|
||||
pref_lang.endswith('lang_ja') or
|
||||
pref_country.startswith('jp')
|
||||
)
|
||||
if is_japanese:
|
||||
results.append((
|
||||
'ウィキペディア',
|
||||
'https://ja.wikipedia.org/wiki/ウィキペディア',
|
||||
'日本語版ウィキペディアの記事です。'
|
||||
))
|
||||
else:
|
||||
results.append((
|
||||
'Wikipedia',
|
||||
'https://www.wikipedia.org/wiki/Wikipedia',
|
||||
'Wikipedia is a free online encyclopedia.'
|
||||
))
|
||||
|
||||
if 'pinterest' in term:
|
||||
results.append((
|
||||
'Pinterest',
|
||||
'https://www.pinterest.com/ideas/',
|
||||
'Discover recipes, home ideas, style inspiration and other ideas.'
|
||||
))
|
||||
|
||||
if 'whoogle' in term:
|
||||
results.append((
|
||||
'Whoogle Search GitHub',
|
||||
'https://github.com/benbusby/whoogle-search',
|
||||
'Source code for Whoogle Search.'
|
||||
))
|
||||
|
||||
if 'github' in term:
|
||||
results.append((
|
||||
'GitHub',
|
||||
f'https://github.com/search?q={slug}',
|
||||
'GitHub is a development platform to host and review code.'
|
||||
))
|
||||
|
||||
for title, url, snippet in DEFAULT_RESULTS:
|
||||
formatted_url = url.format(slug=slug, term=term, title=title.replace(' ', '_'))
|
||||
formatted_snippet = snippet.format(term=query, title=title)
|
||||
results.append((title, formatted_url, formatted_snippet))
|
||||
|
||||
unique = []
|
||||
seen = set()
|
||||
for entry in results:
|
||||
if entry[1] in seen:
|
||||
continue
|
||||
seen.add(entry[1])
|
||||
unique.append(entry)
|
||||
|
||||
return ''.join(_result_block(*entry) for entry in unique)
|
||||
|
||||
|
||||
def build_mock_response(raw_query, language='', country=''):
|
||||
if '&' in raw_query:
|
||||
q_part, extra = raw_query.split('&', 1)
|
||||
else:
|
||||
q_part, extra = raw_query, ''
|
||||
|
||||
query = unquote(q_part)
|
||||
params = parse_qs(extra)
|
||||
|
||||
results_html = _main_results(query, params, language, country)
|
||||
safe_query = query.replace('"', '')
|
||||
pagination = (
|
||||
f'<a href="/search?q={q_part}&start=10">Next</a>'
|
||||
f'<a href="/search?q={q_part}&start=20">More</a>'
|
||||
)
|
||||
|
||||
return (
|
||||
'<html>'
|
||||
'<head><title>Mock Google Results</title></head>'
|
||||
'<body>'
|
||||
f'<div id="main">{results_html}</div>'
|
||||
f'<form action="/search" method="GET">'
|
||||
f'<input name="q" value="{safe_query}">'
|
||||
'</form>'
|
||||
f'<footer class="TuS8Ad">{pagination}</footer>'
|
||||
'</body>'
|
||||
'</html>'
|
||||
)
|
||||
Loading…
Reference in a new issue