mirror of
https://github.com/benbusby/whoogle-search.git
synced 2026-03-11 08:54:34 +00:00
Update README and codebase to enhance User Agent handling
- Revised README to reflect changes in Google search behavior and Whoogle's response strategies. - Implemented a User Agent pool for improved request handling, including fallback mechanisms. - Added configuration options for displaying the User Agent in search results. - Introduced a command-line tool for generating custom User Agent strings. - Enhanced request headers to include additional parameters for better compatibility with Google services.
This commit is contained in:
parent
5f17b82735
commit
9b3a6ce550
13 changed files with 1379 additions and 36 deletions
157
README.md
157
README.md
|
|
@ -1,10 +1,10 @@
|
|||
>[!WARNING]
|
||||
>
|
||||
>As of 16 January, 2025, Google seemingly no longer supports performing search queries without JavaScript enabled. This is a fundamental part of how Whoogle
|
||||
>Since 16 January, 2025, Google has been attacking the ability to perform search queries without JavaScript enabled. This is a fundamental part of how Whoogle
|
||||
>works -- Whoogle requests the JavaScript-free search results, then filters out garbage from the results page and proxies all external content for the user.
|
||||
>
|
||||
>This is possibly a breaking change that will mean the end for Whoogle. I'll continue monitoring the status of their JS-free results and looking into workarounds,
|
||||
>and will make another post if a solution is found (or not).
|
||||
>This is possibly a breaking change that may mean the end for Whoogle. We'll continue fighting back and releasing workarounds until all workarounds are
|
||||
>exhausted or a better method is found.
|
||||
|
||||
___
|
||||
|
||||
|
|
@ -68,7 +68,12 @@ Contents
|
|||
- POST request search and suggestion queries (when possible)
|
||||
- View images at full res without site redirect (currently mobile only)
|
||||
- Light/Dark/System theme modes (with support for [custom CSS theming](https://github.com/benbusby/whoogle-search/wiki/User-Contributed-CSS-Themes))
|
||||
- Randomly generated User Agent
|
||||
- Auto-generated Opera User Agents with random rotation
|
||||
- 10 unique Opera-based UAs generated on startup from 115 language variants
|
||||
- Randomly rotated for each search request to avoid detection patterns
|
||||
- Cached across restarts with configurable refresh options
|
||||
- Fallback to safe default UA if generation fails
|
||||
- Optional display of current UA in search results footer
|
||||
- Easy to install/deploy
|
||||
- DDG-style bang (i.e. `!<tag> <query>`) searches
|
||||
- User-defined [custom bangs](#custom-bangs)
|
||||
|
|
@ -437,9 +442,12 @@ There are a few optional environment variables available for customizing a Whoog
|
|||
| WHOOGLE_PROXY_PASS | The password of the proxy server. |
|
||||
| WHOOGLE_PROXY_TYPE | The type of the proxy server. Can be "socks5", "socks4", or "http". |
|
||||
| WHOOGLE_PROXY_LOC | The location of the proxy server (host or ip). |
|
||||
| WHOOGLE_USER_AGENT | The desktop user agent to use. Defaults to a randomly generated one. |
|
||||
| WHOOGLE_USER_AGENT_MOBILE | The mobile user agent to use. Defaults to a randomly generated one. |
|
||||
| WHOOGLE_USER_AGENT | The desktop user agent to use when using 'env_conf' option. Leave empty to use auto-generated Opera UAs. |
|
||||
| WHOOGLE_USER_AGENT_MOBILE | The mobile user agent to use when using 'env_conf' option. Leave empty to use auto-generated Opera UAs. |
|
||||
| WHOOGLE_USE_CLIENT_USER_AGENT | Enable to use your own user agent for all requests. Defaults to false. |
|
||||
| WHOOGLE_UA_CACHE_PERSISTENT | Whether to persist auto-generated UAs across restarts. Set to '0' to regenerate on each startup. Default '1'. |
|
||||
| WHOOGLE_UA_CACHE_REFRESH_DAYS | Auto-refresh UA cache after N days. Set to '0' to never refresh (cache persists indefinitely). Default '0'. |
|
||||
| WHOOGLE_UA_LIST_FILE | Path to text file containing custom UA strings (one per line). When set, uses these instead of auto-generated UAs. |
|
||||
| WHOOGLE_REDIRECTS | Specify sites that should be redirected elsewhere. See [custom redirecting](#custom-redirecting). |
|
||||
| EXPOSE_PORT | The port where Whoogle will be exposed. |
|
||||
| HTTPS_ONLY | Enforce HTTPS. (See [here](https://github.com/benbusby/whoogle-search#https-enforcement)) |
|
||||
|
|
@ -491,6 +499,7 @@ These environment variables allow setting default config values, but can be over
|
|||
| WHOOGLE_CONFIG_PREFERENCES_ENCRYPTED | Encrypt preferences token, requires preferences key |
|
||||
| WHOOGLE_CONFIG_PREFERENCES_KEY | Key to encrypt preferences in URL (REQUIRED to show url) |
|
||||
| WHOOGLE_CONFIG_ANON_VIEW | Include the "anonymous view" option for each search result |
|
||||
| WHOOGLE_CONFIG_SHOW_USER_AGENT | Display the User Agent string used for search in results footer |
|
||||
|
||||
## Usage
|
||||
Same as most search engines, with the exception of filtering by time range.
|
||||
|
|
@ -662,6 +671,141 @@ Whoogle can optionally serve a single bundled CSS and JS to reduce the number of
|
|||
- When disabled (default), templates load individual CSS/JS files for easier development.
|
||||
- Note: Theme CSS (`*-theme.css`) are still loaded separately to honor user theme selection.
|
||||
|
||||
## User Agent Generator Tool
|
||||
|
||||
A standalone command-line tool is available for generating Opera User Agent strings on demand:
|
||||
|
||||
```bash
|
||||
# Generate 10 User Agent strings (default)
|
||||
python misc/generate_uas.py
|
||||
|
||||
# Generate custom number of UAs
|
||||
python misc/generate_uas.py 20
|
||||
```
|
||||
|
||||
This tool is useful for:
|
||||
- Testing different UA strings
|
||||
- Generating UAs for other projects
|
||||
- Verifying UA generation patterns
|
||||
- Debugging UA-related issues
|
||||
|
||||
## Using Custom User Agent Lists
|
||||
|
||||
Instead of using auto-generated Opera UA strings, you can provide your own list of User Agent strings for Whoogle to use.
|
||||
|
||||
### Setup
|
||||
|
||||
1. Create a text file with your preferred UA strings (one per line):
|
||||
|
||||
```
|
||||
Opera/9.80 (J2ME/MIDP; Opera Mini/4.2.13337/22.478; U; en) Presto/2.4.15 Version/10.00
|
||||
Opera/9.80 (Android; Linux; Opera Mobi/498; U; en) Presto/2.12.423 Version/10.1
|
||||
Opera/9.30 (Nintendo Wii; U; ; 3642; en)
|
||||
```
|
||||
|
||||
2. Set the `WHOOGLE_UA_LIST_FILE` environment variable to point to your file:
|
||||
|
||||
```bash
|
||||
# Docker
|
||||
docker run -e WHOOGLE_UA_LIST_FILE=/config/my_user_agents.txt ...
|
||||
|
||||
# Docker Compose
|
||||
environment:
|
||||
- WHOOGLE_UA_LIST_FILE=/config/my_user_agents.txt
|
||||
|
||||
# Manual/systemd
|
||||
export WHOOGLE_UA_LIST_FILE=/path/to/my_user_agents.txt
|
||||
```
|
||||
|
||||
### Priority Order
|
||||
|
||||
Whoogle uses the following priority when loading User Agent strings:
|
||||
|
||||
1. **Custom UA list file** (if `WHOOGLE_UA_LIST_FILE` is set and valid)
|
||||
2. **Cached auto-generated UAs** (if cache exists and is valid)
|
||||
3. **Newly generated UAs** (if no cache or cache expired)
|
||||
|
||||
### Tips
|
||||
|
||||
- You can use the output from `misc/check_google_user_agents.py` as your custom UA list
|
||||
- Generate a list with `python misc/generate_uas.py 50 2>/dev/null > my_uas.txt`
|
||||
- Mix different UA types (Opera, Firefox, Chrome) for more variety
|
||||
- Keep the file readable by Whoogle (proper permissions)
|
||||
- One UA string per line, blank lines are ignored
|
||||
|
||||
### Example Workflow
|
||||
|
||||
```bash
|
||||
# Generate and test UAs, save working ones
|
||||
python misc/generate_uas.py 100 2>/dev/null > candidate_uas.txt
|
||||
python misc/check_google_user_agents.py candidate_uas.txt --output working_uas.txt
|
||||
|
||||
# Use the working UAs with Whoogle
|
||||
export WHOOGLE_UA_LIST_FILE=./working_uas.txt
|
||||
./run
|
||||
```
|
||||
|
||||
## User Agent Testing Tool
|
||||
|
||||
Whoogle now includes a comprehensive testing tool (`misc/check_google_user_agents.py`) to verify which User Agent strings successfully return Google search results without triggering blocks, JavaScript-only pages, or browser upgrade prompts.
|
||||
|
||||
### Usage
|
||||
|
||||
```bash
|
||||
# Test all UAs from a file
|
||||
python misc/check_google_user_agents.py UAs.txt
|
||||
|
||||
# Save working UAs to a file (appends incrementally)
|
||||
python misc/check_google_user_agents.py UAs.txt --output working_uas.txt
|
||||
|
||||
# Use a specific search query
|
||||
python misc/check_google_user_agents.py UAs.txt --query "python programming"
|
||||
|
||||
# Verbose mode to see detailed results
|
||||
python misc/check_google_user_agents.py UAs.txt --output working.txt --verbose
|
||||
|
||||
# Adjust delay between requests (default: 0.5 seconds)
|
||||
python misc/check_google_user_agents.py UAs.txt --delay 1.0
|
||||
|
||||
# Set request timeout (default: 10 seconds)
|
||||
python misc/check_google_user_agents.py UAs.txt --timeout 15.0
|
||||
```
|
||||
|
||||
### Features
|
||||
|
||||
- **Incremental Results**: Working UAs are saved immediately to the output file (append mode), so progress is preserved even if interrupted
|
||||
- **Duplicate Detection**: Automatically skips UAs already in the output file when resuming
|
||||
- **Random Query Cycling**: By default, cycles through diverse search queries to simulate realistic usage patterns
|
||||
- **Rate Limit Detection**: Detects and reports Google rate limiting with recovery instructions
|
||||
- **Comprehensive Validation**: Checks for:
|
||||
- HTTP status codes (blocks, server errors, rate limits)
|
||||
- Block markers (unusual traffic, upgrade browser messages)
|
||||
- Success markers (actual search result HTML elements)
|
||||
- JavaScript-only pages and redirects
|
||||
- Response size validation
|
||||
|
||||
### Testing Methodology
|
||||
|
||||
The tool evaluates UAs against multiple criteria:
|
||||
|
||||
1. **HTTP Status**: Rejects 4xx/5xx errors, detects 429 rate limits
|
||||
2. **Block Detection**: Searches for Google's block messages (CAPTCHA, unusual traffic, etc.)
|
||||
3. **JavaScript Detection**: Identifies JS-only pages and noscript redirects
|
||||
4. **Result Validation**: Confirms presence of actual search result HTML elements
|
||||
5. **Content Analysis**: Validates response size and structure
|
||||
|
||||
This tool was used to discover and validate the working Opera UA patterns that power Whoogle's auto-generation feature.
|
||||
|
||||
## Known Issues
|
||||
|
||||
### User Agent Strings and Image Search
|
||||
|
||||
**Issue**: Most, if not all, of the auto-generated Opera User Agent strings may fail when performing **image searches** on Google. This appears to be a limitation with how Google's image search validates User Agent strings.
|
||||
|
||||
**Impact**:
|
||||
- Regular web searches work correctly with generated UAs
|
||||
- Image search may return errors or no results
|
||||
|
||||
## Contributing
|
||||
|
||||
Under the hood, Whoogle is a basic Flask app with the following structure:
|
||||
|
|
@ -675,6 +819,7 @@ Under the hood, Whoogle is a basic Flask app with the following structure:
|
|||
- `results.py`: Utility functions for interpreting/modifying individual search results
|
||||
- `search.py`: Creates and handles new search queries
|
||||
- `session.py`: Miscellaneous methods related to user sessions
|
||||
- `ua_generator.py`: Auto-generates Opera User Agent strings with pattern-based randomization
|
||||
- `templates/`
|
||||
- `index.html`: The home page template
|
||||
- `display.html`: The search results template
|
||||
|
|
|
|||
|
|
@ -3,6 +3,7 @@ from app.request import send_tor_signal
|
|||
from app.utils.session import generate_key
|
||||
from app.utils.bangs import gen_bangs_json, load_all_bangs
|
||||
from app.utils.misc import gen_file_hash, read_config_bool
|
||||
from app.utils.ua_generator import load_ua_pool
|
||||
from base64 import b64encode
|
||||
from bs4 import MarkupResemblesLocatorWarning
|
||||
from datetime import datetime, timedelta
|
||||
|
|
@ -107,6 +108,16 @@ if not os.path.exists(app.config['BANG_PATH']):
|
|||
if not os.path.exists(app.config['BUILD_FOLDER']):
|
||||
os.makedirs(app.config['BUILD_FOLDER'])
|
||||
|
||||
# Initialize User Agent pool
|
||||
app.config['UA_CACHE_PATH'] = os.path.join(app.config['CONFIG_PATH'], 'ua_cache.json')
|
||||
try:
|
||||
app.config['UA_POOL'] = load_ua_pool(app.config['UA_CACHE_PATH'], count=10)
|
||||
except Exception as e:
|
||||
# If UA pool loading fails, log warning and set empty pool
|
||||
# The gen_user_agent function will handle the fallback
|
||||
print(f"Warning: Could not initialize UA pool: {e}")
|
||||
app.config['UA_POOL'] = []
|
||||
|
||||
# Session values
|
||||
app_key_path = os.path.join(app.config['CONFIG_PATH'], 'whoogle.key')
|
||||
if os.path.exists(app_key_path):
|
||||
|
|
|
|||
|
|
@ -45,6 +45,7 @@ class Config:
|
|||
self.user_agent = kwargs.get('user_agent', default_ua_option)
|
||||
self.custom_user_agent = kwargs.get('custom_user_agent', '')
|
||||
self.use_custom_user_agent = kwargs.get('use_custom_user_agent', False)
|
||||
self.show_user_agent = read_config_bool('WHOOGLE_CONFIG_SHOW_USER_AGENT')
|
||||
|
||||
# Add user agent related keys to safe_keys
|
||||
self.safe_keys = [
|
||||
|
|
@ -63,7 +64,8 @@ class Config:
|
|||
'tbs',
|
||||
'user_agent',
|
||||
'custom_user_agent',
|
||||
'use_custom_user_agent'
|
||||
'use_custom_user_agent',
|
||||
'show_user_agent'
|
||||
]
|
||||
|
||||
app_config = current_app.config
|
||||
|
|
@ -97,7 +99,10 @@ class Config:
|
|||
if kwargs:
|
||||
mutable_attrs = self.get_mutable_attrs()
|
||||
for attr in mutable_attrs:
|
||||
if attr in kwargs.keys():
|
||||
if attr == 'show_user_agent':
|
||||
# Handle show_user_agent as boolean
|
||||
self.show_user_agent = bool(kwargs.get(attr))
|
||||
elif attr in kwargs.keys():
|
||||
setattr(self, attr, kwargs[attr])
|
||||
elif attr not in kwargs.keys() and mutable_attrs[attr] == bool:
|
||||
setattr(self, attr, False)
|
||||
|
|
|
|||
117
app/request.py
117
app/request.py
|
|
@ -1,6 +1,7 @@
|
|||
from app.models.config import Config
|
||||
from app.utils.misc import read_config_bool
|
||||
from app.services.provider import get_http_client
|
||||
from app.utils.ua_generator import load_ua_pool, get_random_ua, DEFAULT_FALLBACK_UA
|
||||
from datetime import datetime
|
||||
from defusedxml import ElementTree as ET
|
||||
import random
|
||||
|
|
@ -16,8 +17,32 @@ MAPS_URL = 'https://maps.google.com/maps'
|
|||
AUTOCOMPLETE_URL = ('https://suggestqueries.google.com/'
|
||||
'complete/search?client=toolbar&')
|
||||
|
||||
MOBILE_UA = '{}/5.0 (Android 0; Mobile; rv:54.0) Gecko/54.0 {}/59.0'
|
||||
DESKTOP_UA = '{}/5.0 (X11; {} x86_64; rv:75.0) Gecko/20100101 {}/75.0'
|
||||
DEFAULT_DESKTOP_UA = (
|
||||
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:131.0) '
|
||||
'Gecko/20100101 Firefox/131.0'
|
||||
)
|
||||
DEFAULT_MOBILE_UA = (
|
||||
'Mozilla/5.0 (Linux; Android 14; Pixel 8 Pro) '
|
||||
'AppleWebKit/537.36 (KHTML, like Gecko) '
|
||||
'Chrome/127.0.0.0 Mobile Safari/537.36'
|
||||
)
|
||||
|
||||
DESKTOP_UAS = [
|
||||
DEFAULT_DESKTOP_UA,
|
||||
'Mozilla/5.0 (X11; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0',
|
||||
'Mozilla/5.0 (Macintosh; Intel Mac OS X 14_5) '
|
||||
'AppleWebKit/537.36 (KHTML, like Gecko) '
|
||||
'Chrome/127.0.0.0 Safari/537.36'
|
||||
]
|
||||
MOBILE_UAS = [
|
||||
DEFAULT_MOBILE_UA,
|
||||
'Mozilla/5.0 (iPhone; CPU iPhone OS 17_5 like Mac OS X) '
|
||||
'AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 '
|
||||
'Mobile/15E148 Safari/604.1',
|
||||
'Mozilla/5.0 (Linux; Android 13; SM-S918B) '
|
||||
'AppleWebKit/537.36 (KHTML, like Gecko) '
|
||||
'Chrome/125.0.0.0 Mobile Safari/537.36'
|
||||
]
|
||||
|
||||
# Valid query params
|
||||
VALID_PARAMS = ['tbs', 'tbm', 'start', 'near', 'source', 'nfpr']
|
||||
|
|
@ -73,8 +98,8 @@ def send_tor_signal(signal: Signal) -> bool:
|
|||
|
||||
|
||||
def gen_user_agent(config, is_mobile) -> str:
|
||||
# Define the default PlayStation Portable user agent (replaces Lynx)
|
||||
DEFAULT_UA = 'Mozilla/4.0 (PSP (PlayStation Portable); 2.00)'
|
||||
# Modern defaults mimic widely-used browsers so Google returns full results.
|
||||
default_ua = DEFAULT_MOBILE_UA if is_mobile else DEFAULT_DESKTOP_UA
|
||||
|
||||
# If using custom user agent, return the custom string
|
||||
if config.user_agent == 'custom' and config.custom_user_agent:
|
||||
|
|
@ -93,18 +118,39 @@ def gen_user_agent(config, is_mobile) -> str:
|
|||
# If env vars are not set, fall back to default
|
||||
return DEFAULT_UA
|
||||
|
||||
# If using default user agent
|
||||
# If using default user agent - use auto-generated Opera UA pool
|
||||
if config.user_agent == 'default':
|
||||
return DEFAULT_UA
|
||||
try:
|
||||
# Try to load UA pool from cache (lazy loading if not in app.config)
|
||||
# First check if we have access to Flask app context
|
||||
try:
|
||||
from flask import current_app
|
||||
if hasattr(current_app, 'config') and 'UA_POOL' in current_app.config:
|
||||
ua_pool = current_app.config['UA_POOL']
|
||||
else:
|
||||
# Fall back to loading from disk
|
||||
config_path = os.environ.get('CONFIG_VOLUME',
|
||||
os.path.join(os.path.dirname(os.path.abspath(__file__)),
|
||||
'static', 'config'))
|
||||
cache_path = os.path.join(config_path, 'ua_cache.json')
|
||||
ua_pool = load_ua_pool(cache_path, count=10)
|
||||
except (ImportError, RuntimeError):
|
||||
# No Flask context available, load from disk
|
||||
config_path = os.environ.get('CONFIG_VOLUME',
|
||||
os.path.join(os.path.dirname(os.path.abspath(__file__)),
|
||||
'static', 'config'))
|
||||
cache_path = os.path.join(config_path, 'ua_cache.json')
|
||||
ua_pool = load_ua_pool(cache_path, count=10)
|
||||
|
||||
return get_random_ua(ua_pool)
|
||||
except Exception as e:
|
||||
# If anything goes wrong, fall back to default Opera UA
|
||||
print(f"Warning: Could not load UA pool, using fallback Opera UA: {e}")
|
||||
return DEFAULT_FALLBACK_UA
|
||||
|
||||
# If no custom user agent is set, generate a random one (for backwards compatibility)
|
||||
firefox = random.choice(['Choir', 'Squier', 'Higher', 'Wire']) + 'fox'
|
||||
linux = random.choice(['Win', 'Sin', 'Gin', 'Fin', 'Kin']) + 'ux'
|
||||
|
||||
if is_mobile:
|
||||
return MOBILE_UA.format("Mozilla", firefox)
|
||||
|
||||
return DESKTOP_UA.format("Mozilla", linux, firefox)
|
||||
candidates = MOBILE_UAS if is_mobile else DESKTOP_UAS
|
||||
return random.choice(candidates)
|
||||
|
||||
|
||||
def gen_query(query, args, config) -> str:
|
||||
|
|
@ -324,23 +370,39 @@ class Request:
|
|||
modified_user_agent = self.modified_user_agent
|
||||
|
||||
headers = {
|
||||
'User-Agent': modified_user_agent
|
||||
'User-Agent': modified_user_agent,
|
||||
'Accept': ('text/html,application/xhtml+xml,application/xml;'
|
||||
'q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8'),
|
||||
'Accept-Language': 'en-US,en;q=0.9',
|
||||
'Accept-Encoding': 'gzip, deflate, br',
|
||||
'Connection': 'keep-alive',
|
||||
'Cache-Control': 'max-age=0',
|
||||
'Pragma': 'no-cache',
|
||||
'Upgrade-Insecure-Requests': '1',
|
||||
'Sec-Fetch-Site': 'none',
|
||||
'Sec-Fetch-Mode': 'navigate',
|
||||
'Sec-Fetch-User': '?1',
|
||||
'Sec-Fetch-Dest': 'document',
|
||||
'Sec-CH-UA': (
|
||||
'"Not/A)Brand";v="8", '
|
||||
'"Chromium";v="127", '
|
||||
'"Google Chrome";v="127"'
|
||||
),
|
||||
'Sec-CH-UA-Mobile': '?0',
|
||||
'Sec-CH-UA-Platform': '"macOS"'
|
||||
}
|
||||
|
||||
# Adding the Accept-Language to the Header if possible
|
||||
# Add Accept-Language header tied to the current config if requested
|
||||
if self.lang_interface:
|
||||
headers.update({'Accept-Language':
|
||||
self.lang_interface.replace('lang_', '')
|
||||
+ ';q=1.0'})
|
||||
headers['Accept-Language'] = (
|
||||
self.lang_interface.replace('lang_', '') + ';q=1.0'
|
||||
)
|
||||
|
||||
# view is suppressed correctly
|
||||
now = datetime.now()
|
||||
consent_cookie = 'CONSENT=PENDING+987; SOCS=CAESHAgBEhIaAB'
|
||||
# Prefer header-based cookies to avoid httpx per-request cookies deprecation
|
||||
if 'Cookie' in headers:
|
||||
headers['Cookie'] += '; ' + consent_cookie
|
||||
else:
|
||||
headers['Cookie'] = consent_cookie
|
||||
# Consent cookies keep Google from showing the interstitial consent wall
|
||||
consent_cookies = {
|
||||
'CONSENT': 'PENDING+987',
|
||||
'SOCS': 'CAESHAgBEhIaAB'
|
||||
}
|
||||
|
||||
# Validate Tor conn and request new identity if the last one failed
|
||||
if self.tor and not send_tor_signal(
|
||||
|
|
@ -371,7 +433,8 @@ class Request:
|
|||
try:
|
||||
response = self.http_client.get(
|
||||
(base_url or self.search_url) + query,
|
||||
headers=headers)
|
||||
headers=headers,
|
||||
cookies=consent_cookies)
|
||||
except httpx.HTTPError as e:
|
||||
raise
|
||||
|
||||
|
|
|
|||
|
|
@ -544,6 +544,13 @@ def search():
|
|||
'results': results
|
||||
})
|
||||
|
||||
# Get the user agent that was used for the search
|
||||
used_user_agent = ''
|
||||
if search_util.user_request:
|
||||
used_user_agent = search_util.user_request.modified_user_agent
|
||||
elif hasattr(g, 'user_request') and g.user_request:
|
||||
used_user_agent = g.user_request.modified_user_agent
|
||||
|
||||
return render_template(
|
||||
'display.html',
|
||||
has_update=app.config['HAS_UPDATE'],
|
||||
|
|
@ -565,6 +572,7 @@ def search():
|
|||
) and not search_util.search_type, # Standard search queries only
|
||||
response=cleanresponse,
|
||||
version_number=app.config['VERSION_NUMBER'],
|
||||
used_user_agent=used_user_agent,
|
||||
search_header=render_template(
|
||||
'header.html',
|
||||
home_url=home_url,
|
||||
|
|
|
|||
|
|
@ -5,5 +5,8 @@
|
|||
{% if has_update %}
|
||||
|| <span class="update_available">Update Available 🟢</span>
|
||||
{% endif %}
|
||||
{% if config.show_user_agent and used_user_agent %}
|
||||
<br><span class="user-agent-display" style="font-size: 0.85em; color: #666;">User Agent: {{ used_user_agent }}</span>
|
||||
{% endif %}
|
||||
</p>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -258,6 +258,11 @@
|
|||
<input type="checkbox" name="accept_language"
|
||||
id="config-accept-language" {{ 'checked' if config.accept_language else '' }}>
|
||||
</div>
|
||||
<div class="config-div config-div-show-user-agent">
|
||||
<label for="config-show-user-agent">Show User Agent in Footer: </label>
|
||||
<input type="checkbox" name="show_user_agent"
|
||||
id="config-show-user-agent" {{ 'checked' if config.show_user_agent else '' }}>
|
||||
</div>
|
||||
<div class="config-div config-div-root-url">
|
||||
<label for="config-url">{{ translation['config-url'] }}: </label>
|
||||
<input type="text" name="url" id="config-url" value="{{ config.url }}">
|
||||
|
|
|
|||
|
|
@ -36,7 +36,7 @@ def fetch_favicon(url: str) -> bytes:
|
|||
bytes - the favicon bytes, or a placeholder image if one
|
||||
was not returned
|
||||
"""
|
||||
response = get(f'{ddg_favicon_site}/{urlparse(url).netloc}.ico')
|
||||
response = httpx.get(f'{ddg_favicon_site}/{urlparse(url).netloc}.ico')
|
||||
|
||||
if response.status_code == 200 and len(response.content) > 0:
|
||||
tmp_mem = io.BytesIO()
|
||||
|
|
|
|||
359
app/utils/ua_generator.py
Normal file
359
app/utils/ua_generator.py
Normal file
|
|
@ -0,0 +1,359 @@
|
|||
"""
|
||||
User Agent Generator for Opera-based UA strings.
|
||||
|
||||
This module generates realistic Opera User Agent strings based on patterns
|
||||
found in working UA strings that successfully bypass Google's restrictions.
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
import random
|
||||
from datetime import datetime, timedelta
|
||||
from typing import List, Dict
|
||||
|
||||
|
||||
# Default fallback UA if generation fails
|
||||
DEFAULT_FALLBACK_UA = "Opera/9.30 (Nintendo Wii; U; ; 3642; en)"
|
||||
|
||||
# Opera UA Pattern Templates
|
||||
OPERA_PATTERNS = [
|
||||
# Opera Mini (J2ME/MIDP)
|
||||
"Opera/9.80 (J2ME/MIDP; Opera Mini/{version}/{build}; U; {lang}) Presto/{presto} Version/{final}",
|
||||
|
||||
# Opera Mobile (Android)
|
||||
"Opera/9.80 (Android; Linux; Opera Mobi/{build}; U; {lang}) Presto/{presto} Version/{final}",
|
||||
|
||||
# Opera Mobile (iPhone)
|
||||
"Opera/9.80 (iPhone; Opera Mini/{version}/{build}; U; {lang}) Presto/{presto} Version/{final}",
|
||||
|
||||
# Opera Mobile (iPad)
|
||||
"Opera/9.80 (iPad; Opera Mini/{version}/{build}; U; {lang}) Presto/{presto} Version/{final}",
|
||||
|
||||
# Opera on Nintendo Wii
|
||||
"Opera/9.30 (Nintendo Wii; U; ; {code}; {lang})",
|
||||
|
||||
# Opera Mobile (S60/SymbOS)
|
||||
"Opera/9.80 (S60; SymbOS; Opera Mobi/{build}; U; {lang}) Presto/{presto} Version/{final}",
|
||||
|
||||
# Opera Mobile (Series 60)
|
||||
"Opera/9.80 (Series 60; Opera Mini/{version}/{build}; U; {lang}) Presto/{presto} Version/{final}",
|
||||
|
||||
# Opera Mobile (BlackBerry)
|
||||
"Opera/9.80 (BlackBerry; Opera Mini/{version}/{build}; U; {lang}) Presto/{presto} Version/{final}",
|
||||
|
||||
# Opera Mobile (Windows Mobile)
|
||||
"Opera/9.80 (Windows Mobile; Opera Mini/{version}/{build}; U; {lang}) Presto/{presto} Version/{final}",
|
||||
]
|
||||
|
||||
# Randomization pools based on working UAs
|
||||
OPERA_MINI_VERSIONS = [
|
||||
"4.0", "4.1.11321", "4.1.12965", "4.1.13573", "4.1.13907", "4.1.14287",
|
||||
"4.1.15082", "4.2.13057", "4.2.13221", "4.2.13265", "4.2.13337",
|
||||
"4.2.13400", "4.2.13918", "4.2.13943", "4.2.14320", "4.2.14409",
|
||||
"4.2.14753", "4.2.14881", "4.2.14885", "4.2.14912", "4.2.15066",
|
||||
"4.2.15410", "4.2.16007", "4.2.16320", "4.2.18887", "4.2.19634",
|
||||
"4.2.21465", "4.2.22228", "4.2.23453", "4.2.24721", "4.3.13337",
|
||||
"4.3.24214", "4.4.26736", "4.4.29476", "4.5.33867", "4.5.40312",
|
||||
"5.0.15650", "5.0.16823", "5.0.17381", "5.0.17443", "5.0.18635",
|
||||
"5.0.18741", "5.0.19683", "5.0.19693", "5.0.20873", "5.0.22349",
|
||||
"5.1.21051", "5.1.21126", "5.1.21214", "5.1.21415", "5.1.21594",
|
||||
"5.1.21595", "5.1.22296", "5.1.22303", "5.1.22396", "5.1.22460",
|
||||
"5.1.22783", "5.1.22784", "6.0.24095", "6.0.24212", "6.0.24455",
|
||||
"6.1.25375", "6.1.25378", "6.1.25759", "6.24093", "6.24096",
|
||||
"6.24209", "6.24288", "6.5.26955", "6.5.29702", "7.0.29952",
|
||||
"7.1.32052", "7.1.32444", "7.1.32694", "7.29530", "7.5.33361",
|
||||
"7.6.35766", "9.80", "36.2.2254"
|
||||
]
|
||||
|
||||
OPERA_MOBI_BUILDS = [
|
||||
"27", "49", "447", "498", "1181", "1209", "3730",
|
||||
"ADR-1011151731", "ADR-1012211514", "ADR-1012221546", "ADR-1012272315",
|
||||
"SYB-1103211396", "SYB-1104061449", "SYB-1107071606",
|
||||
"ADR-1111101157"
|
||||
]
|
||||
|
||||
BUILD_NUMBERS = [
|
||||
"18.678", "18.684", "18.738", "18.794", "19.892", "19.916",
|
||||
"20.2477", "20.2479", "20.2485", "20.2489", "21.529", "22.387",
|
||||
"22.394", "22.401", "22.414", "22.453", "22.478", "23.317",
|
||||
"23.333", "23.334", "23.377", "23.390", "24.741", "24.743",
|
||||
"24.746", "24.783", "24.838", "24.871", "24.899", "25.657",
|
||||
"25.677", "25.729", "25.872", "26.1305", "27.1366", "27.1407",
|
||||
"27.1573", "28.2075", "28.2555", "28.2647", "28.2766", "29.3594",
|
||||
"30.3316", "31.1350", "35.2883", "35.5706", "37.6584", "119.132",
|
||||
"170.51", "170.54", "764", "870", "886", "490", "503"
|
||||
]
|
||||
|
||||
PRESTO_VERSIONS = [
|
||||
"2.2.0", "2.4.15", "2.4.154.15", "2.4.18", "2.5.25", "2.5.28",
|
||||
"2.6.35", "2.7.60", "2.7.81", "2.8.119", "2.8.149", "2.8.191",
|
||||
"2.9.201", "2.12.423"
|
||||
]
|
||||
|
||||
FINAL_VERSIONS = [
|
||||
"10.00", "10.1", "10.5", "10.54", "10.5454", "11.00", "11.10",
|
||||
"12.02", "12.16", "13.00"
|
||||
]
|
||||
|
||||
LANGUAGES = [
|
||||
# English variants
|
||||
"en", "en-US", "en-GB", "en-CA", "en-AU", "en-NZ", "en-ZA", "en-IN", "en-SG",
|
||||
# Western European
|
||||
"de", "de-DE", "de-AT", "de-CH",
|
||||
"fr", "fr-FR", "fr-CA", "fr-BE", "fr-CH", "fr-LU",
|
||||
"es", "es-ES", "es-MX", "es-AR", "es-CO", "es-CL", "es-PE", "es-VE", "es-LA",
|
||||
"it", "it-IT", "it-CH",
|
||||
"pt", "pt-PT", "pt-BR",
|
||||
"nl", "nl-NL", "nl-BE",
|
||||
# Nordic languages
|
||||
"da", "da-DK",
|
||||
"sv", "sv-SE",
|
||||
"no", "no-NO", "nb", "nn",
|
||||
"fi", "fi-FI",
|
||||
"is", "is-IS",
|
||||
# Eastern European
|
||||
"pl", "pl-PL",
|
||||
"cs", "cs-CZ",
|
||||
"sk", "sk-SK",
|
||||
"hu", "hu-HU",
|
||||
"ro", "ro-RO",
|
||||
"bg", "bg-BG",
|
||||
"hr", "hr-HR",
|
||||
"sr", "sr-RS",
|
||||
"sl", "sl-SI",
|
||||
"uk", "uk-UA",
|
||||
"ru", "ru-RU",
|
||||
# Asian languages
|
||||
"zh", "zh-CN", "zh-TW", "zh-HK",
|
||||
"ja", "ja-JP",
|
||||
"ko", "ko-KR",
|
||||
"th", "th-TH",
|
||||
"vi", "vi-VN",
|
||||
"id", "id-ID",
|
||||
"ms", "ms-MY",
|
||||
"fil", "tl",
|
||||
# Middle Eastern
|
||||
"tr", "tr-TR",
|
||||
"ar", "ar-SA", "ar-AE", "ar-EG",
|
||||
"he", "he-IL",
|
||||
"fa", "fa-IR",
|
||||
# Other
|
||||
"hi", "hi-IN",
|
||||
"bn", "bn-IN",
|
||||
"ta", "ta-IN",
|
||||
"te", "te-IN",
|
||||
"mr", "mr-IN",
|
||||
"el", "el-GR",
|
||||
"ca", "ca-ES",
|
||||
"eu", "eu-ES"
|
||||
]
|
||||
|
||||
WII_CODES = [
|
||||
"1038-58", "1309-9", "1621", "2047-7", "2071", "2077-4", "3642"
|
||||
]
|
||||
|
||||
|
||||
def generate_opera_ua() -> str:
|
||||
"""
|
||||
Generate a single random Opera User Agent string.
|
||||
|
||||
Returns:
|
||||
str: A randomly generated Opera UA string
|
||||
"""
|
||||
pattern = random.choice(OPERA_PATTERNS)
|
||||
|
||||
# Determine which parameters to use based on the pattern
|
||||
params = {
|
||||
'lang': random.choice(LANGUAGES)
|
||||
}
|
||||
|
||||
# Nintendo Wii pattern
|
||||
if "Nintendo Wii" in pattern:
|
||||
params['code'] = random.choice(WII_CODES)
|
||||
else:
|
||||
# Other patterns
|
||||
if '{version}' in pattern:
|
||||
params['version'] = random.choice(OPERA_MINI_VERSIONS)
|
||||
|
||||
if '{build}' in pattern:
|
||||
# Use MOBI build for "Opera Mobi", regular build for "Opera Mini"
|
||||
if "Opera Mobi" in pattern:
|
||||
params['build'] = random.choice(OPERA_MOBI_BUILDS)
|
||||
else:
|
||||
params['build'] = random.choice(BUILD_NUMBERS)
|
||||
|
||||
if '{presto}' in pattern:
|
||||
params['presto'] = random.choice(PRESTO_VERSIONS)
|
||||
|
||||
if '{final}' in pattern:
|
||||
params['final'] = random.choice(FINAL_VERSIONS)
|
||||
|
||||
return pattern.format(**params)
|
||||
|
||||
|
||||
def generate_ua_pool(count: int = 10) -> List[str]:
|
||||
"""
|
||||
Generate a pool of unique Opera User Agent strings.
|
||||
|
||||
Args:
|
||||
count: Number of UA strings to generate (default: 10)
|
||||
|
||||
Returns:
|
||||
List[str]: List of unique UA strings
|
||||
"""
|
||||
ua_pool = set()
|
||||
|
||||
# Keep generating until we have enough unique UAs
|
||||
# Add safety limit to prevent infinite loop
|
||||
max_attempts = count * 100
|
||||
attempts = 0
|
||||
|
||||
try:
|
||||
while len(ua_pool) < count and attempts < max_attempts:
|
||||
ua = generate_opera_ua()
|
||||
ua_pool.add(ua)
|
||||
attempts += 1
|
||||
except Exception:
|
||||
# If generation fails entirely, return at least the default fallback
|
||||
if not ua_pool:
|
||||
return [DEFAULT_FALLBACK_UA]
|
||||
|
||||
# If we couldn't generate enough, fill remaining with default
|
||||
result = list(ua_pool)
|
||||
while len(result) < count:
|
||||
result.append(DEFAULT_FALLBACK_UA)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def save_ua_pool(uas: List[str], cache_path: str) -> None:
|
||||
"""
|
||||
Save UA pool to cache file.
|
||||
|
||||
Args:
|
||||
uas: List of UA strings to save
|
||||
cache_path: Path to cache file
|
||||
"""
|
||||
cache_data = {
|
||||
'generated_at': datetime.now().isoformat(),
|
||||
'user_agents': uas
|
||||
}
|
||||
|
||||
# Ensure directory exists
|
||||
cache_dir = os.path.dirname(cache_path)
|
||||
if cache_dir and not os.path.exists(cache_dir):
|
||||
os.makedirs(cache_dir, exist_ok=True)
|
||||
|
||||
with open(cache_path, 'w', encoding='utf-8') as f:
|
||||
json.dump(cache_data, f, indent=2)
|
||||
|
||||
|
||||
def load_custom_ua_list(file_path: str) -> List[str]:
|
||||
"""
|
||||
Load custom UA list from a text file.
|
||||
|
||||
Args:
|
||||
file_path: Path to text file containing UA strings (one per line)
|
||||
|
||||
Returns:
|
||||
List[str]: List of UA strings, or empty list if file is invalid
|
||||
"""
|
||||
try:
|
||||
with open(file_path, 'r', encoding='utf-8') as f:
|
||||
uas = [line.strip() for line in f if line.strip()]
|
||||
|
||||
# Validate that we have at least one UA
|
||||
if not uas:
|
||||
return []
|
||||
|
||||
return uas
|
||||
except (FileNotFoundError, PermissionError, UnicodeDecodeError):
|
||||
return []
|
||||
|
||||
|
||||
def load_ua_pool(cache_path: str, count: int = 10) -> List[str]:
|
||||
"""
|
||||
Load UA pool from custom list file, cache, or generate new one.
|
||||
|
||||
Priority order:
|
||||
1. Custom UA list file (if WHOOGLE_UA_LIST_FILE is set)
|
||||
2. Cached auto-generated UAs
|
||||
3. Newly generated UAs
|
||||
|
||||
Args:
|
||||
cache_path: Path to cache file
|
||||
count: Number of UAs to generate if cache is invalid (default: 10)
|
||||
|
||||
Returns:
|
||||
List[str]: List of UA strings
|
||||
"""
|
||||
# Check for custom UA list file first (highest priority)
|
||||
custom_ua_file = os.environ.get('WHOOGLE_UA_LIST_FILE', '').strip()
|
||||
if custom_ua_file:
|
||||
custom_uas = load_custom_ua_list(custom_ua_file)
|
||||
if custom_uas:
|
||||
# Custom list loaded successfully
|
||||
return custom_uas
|
||||
else:
|
||||
# Custom file specified but invalid, log warning and fall back
|
||||
print(f"Warning: Custom UA list file '{custom_ua_file}' not found or invalid, falling back to auto-generated UAs")
|
||||
|
||||
# Check if we should use cache
|
||||
use_cache = os.environ.get('WHOOGLE_UA_CACHE_PERSISTENT', '1') == '1'
|
||||
refresh_days = int(os.environ.get('WHOOGLE_UA_CACHE_REFRESH_DAYS', '0'))
|
||||
|
||||
# If cache disabled, always generate new
|
||||
if not use_cache:
|
||||
uas = generate_ua_pool(count)
|
||||
save_ua_pool(uas, cache_path)
|
||||
return uas
|
||||
|
||||
# Try to load from cache
|
||||
if os.path.exists(cache_path):
|
||||
try:
|
||||
with open(cache_path, 'r', encoding='utf-8') as f:
|
||||
cache_data = json.load(f)
|
||||
|
||||
# Check if cache is expired (if refresh_days > 0)
|
||||
if refresh_days > 0:
|
||||
generated_at = datetime.fromisoformat(cache_data['generated_at'])
|
||||
age_days = (datetime.now() - generated_at).days
|
||||
|
||||
if age_days >= refresh_days:
|
||||
# Cache expired, generate new
|
||||
uas = generate_ua_pool(count)
|
||||
save_ua_pool(uas, cache_path)
|
||||
return uas
|
||||
|
||||
# Cache is valid, return it
|
||||
return cache_data['user_agents']
|
||||
except (json.JSONDecodeError, KeyError, ValueError):
|
||||
# Cache file is corrupted, generate new
|
||||
pass
|
||||
|
||||
# No valid cache, generate new
|
||||
uas = generate_ua_pool(count)
|
||||
save_ua_pool(uas, cache_path)
|
||||
return uas
|
||||
|
||||
|
||||
def get_random_ua(ua_pool: List[str]) -> str:
|
||||
"""
|
||||
Get a random UA from the pool.
|
||||
|
||||
Args:
|
||||
ua_pool: List of UA strings
|
||||
|
||||
Returns:
|
||||
str: Random UA string from the pool
|
||||
"""
|
||||
if not ua_pool:
|
||||
# Fallback to generating one if pool is empty
|
||||
try:
|
||||
return generate_opera_ua()
|
||||
except Exception:
|
||||
# If generation fails, use default fallback
|
||||
return DEFAULT_FALLBACK_UA
|
||||
|
||||
return random.choice(ua_pool)
|
||||
|
||||
363
misc/check_google_user_agents.py
Executable file
363
misc/check_google_user_agents.py
Executable file
|
|
@ -0,0 +1,363 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
Test User Agent strings against Google to find which ones return actual search results
|
||||
instead of JavaScript pages or upgrade browser messages.
|
||||
|
||||
Usage:
|
||||
python test_google_user_agents.py <user_agent_file> [--output <output_file>] [--query <search_query>]
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import random
|
||||
import sys
|
||||
import time
|
||||
from typing import List, Tuple
|
||||
import requests
|
||||
|
||||
# Common search queries to cycle through for more realistic testing
|
||||
DEFAULT_SEARCH_QUERIES = [
|
||||
"python programming",
|
||||
"weather today",
|
||||
"news",
|
||||
"how to cook pasta",
|
||||
"best movies 2025",
|
||||
"restaurants near me",
|
||||
"translate hello",
|
||||
"calculator",
|
||||
"time",
|
||||
"maps",
|
||||
"images",
|
||||
"videos",
|
||||
"shopping",
|
||||
"travel",
|
||||
"sports scores",
|
||||
"stock market",
|
||||
"recipes",
|
||||
"music",
|
||||
"books",
|
||||
"technology",
|
||||
"AI",
|
||||
"AI programming",
|
||||
"Why does google hate users?"
|
||||
]
|
||||
|
||||
# Markers that indicate blocked/JS pages
|
||||
BLOCK_MARKERS = [
|
||||
"unusual traffic",
|
||||
"sorry but your computer",
|
||||
"solve the captcha",
|
||||
"request looks automated",
|
||||
"g-recaptcha",
|
||||
"upgrade your browser",
|
||||
"browser is not supported",
|
||||
"please upgrade",
|
||||
"isn't supported",
|
||||
"isn\"t supported", # With escaped quote
|
||||
"upgrade to a recent version",
|
||||
"update your browser",
|
||||
"your browser isn't supported",
|
||||
]
|
||||
|
||||
# Markers that indicate actual search results
|
||||
SUCCESS_MARKERS = [
|
||||
'<div class="g"', # Google search result container
|
||||
'<div id="search"', # Search results container
|
||||
'<div class="rc"', # Result container
|
||||
'class="yuRUbf"', # Result link container
|
||||
'class="LC20lb"', # Result title
|
||||
'- Google Search</title>', # Page title indicator
|
||||
'id="rso"', # Results container
|
||||
'class="g"', # Result class (without div tag)
|
||||
]
|
||||
|
||||
|
||||
def read_user_agents(file_path: str) -> List[str]:
|
||||
"""Read user agent strings from a file, one per line."""
|
||||
try:
|
||||
with open(file_path, 'r', encoding='utf-8') as f:
|
||||
user_agents = [line.strip() for line in f if line.strip()]
|
||||
return user_agents
|
||||
except FileNotFoundError:
|
||||
print(f"Error: File '{file_path}' not found.", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
except Exception as e:
|
||||
print(f"Error reading file: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
def test_user_agent(user_agent: str, query: str = "test", timeout: float = 10.0) -> Tuple[bool, str]:
|
||||
"""
|
||||
Test a user agent against Google search.
|
||||
|
||||
Returns:
|
||||
Tuple of (is_working: bool, reason: str)
|
||||
"""
|
||||
url = "https://www.google.com/search"
|
||||
params = {"q": query, "gbv": "1", "num": "10"}
|
||||
|
||||
headers = {
|
||||
"User-Agent": user_agent,
|
||||
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
|
||||
"Accept-Language": "en-US,en;q=0.9",
|
||||
"Accept-Encoding": "gzip, deflate, br",
|
||||
"Connection": "keep-alive",
|
||||
"Upgrade-Insecure-Requests": "1",
|
||||
}
|
||||
|
||||
try:
|
||||
response = requests.get(url, params=params, headers=headers, timeout=timeout)
|
||||
|
||||
# Check HTTP status
|
||||
if response.status_code == 429:
|
||||
# Rate limited - raise this so we can handle it specially
|
||||
raise Exception(f"Rate limited (429)")
|
||||
if response.status_code >= 500:
|
||||
return False, f"Server error ({response.status_code})"
|
||||
if response.status_code == 403:
|
||||
return False, f"Blocked ({response.status_code})"
|
||||
if response.status_code >= 400:
|
||||
return False, f"HTTP {response.status_code}"
|
||||
|
||||
body_lower = response.text.lower()
|
||||
|
||||
# Check for block markers
|
||||
for marker in BLOCK_MARKERS:
|
||||
if marker.lower() in body_lower:
|
||||
return False, f"Blocked: {marker}"
|
||||
|
||||
# Check for redirect indicators first - these indicate non-working responses
|
||||
has_redirect = ("window.location" in body_lower or "location.href" in body_lower) and "google.com" not in body_lower
|
||||
if has_redirect:
|
||||
return False, "JavaScript redirect detected"
|
||||
|
||||
# Check for noscript redirect (another indicator of JS-only page)
|
||||
if 'noscript' in body_lower and 'http-equiv="refresh"' in body_lower:
|
||||
return False, "NoScript redirect page"
|
||||
|
||||
# Check for success markers (actual search results)
|
||||
# We need at least one strong indicator of search results
|
||||
has_results = any(marker in response.text for marker in SUCCESS_MARKERS)
|
||||
|
||||
if has_results:
|
||||
return True, "OK - Has search results"
|
||||
else:
|
||||
# Check for very short responses (likely error pages)
|
||||
if len(response.text) < 1000:
|
||||
return False, "Response too short (likely error page)"
|
||||
# If we don't have success markers, it's not a working response
|
||||
# Even if it's substantial and doesn't have block markers, it might be a JS-only page
|
||||
return False, "No search results found"
|
||||
|
||||
except requests.Timeout:
|
||||
return False, "Request timeout"
|
||||
except requests.HTTPError as e:
|
||||
if e.response and e.response.status_code == 429:
|
||||
# Rate limited - raise this so we can handle it specially
|
||||
raise Exception(f"Rate limited (429) - {str(e)}")
|
||||
return False, f"HTTP error: {str(e)}"
|
||||
except requests.RequestException as e:
|
||||
# Check if it's a 429 in the response
|
||||
if hasattr(e, 'response') and e.response and e.response.status_code == 429:
|
||||
raise Exception(f"Rate limited (429) - {str(e)}")
|
||||
return False, f"Request error: {str(e)}"
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Test User Agent strings against Google to find working ones.",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
python test_google_user_agents.py UAs.txt
|
||||
python test_google_user_agents.py UAs.txt --output working_uas.txt
|
||||
python test_google_user_agents.py UAs.txt --query "python programming"
|
||||
"""
|
||||
)
|
||||
parser.add_argument(
|
||||
"user_agent_file",
|
||||
help="Path to file containing user agent strings (one per line)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--output", "-o",
|
||||
help="Output file to write working user agents (default: stdout)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--query", "-q",
|
||||
default=None,
|
||||
help="Search query to use for testing (default: cycles through random queries)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--random-queries", "-r",
|
||||
action="store_true",
|
||||
help="Use random queries from a predefined list (default: True if --query not specified)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--timeout", "-t",
|
||||
type=float,
|
||||
default=10.0,
|
||||
help="Request timeout in seconds (default: 10.0)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--delay", "-d",
|
||||
type=float,
|
||||
default=0.5,
|
||||
help="Delay between requests in seconds (default: 0.5)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--verbose", "-v",
|
||||
action="store_true",
|
||||
help="Show detailed results for each user agent"
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Determine query strategy
|
||||
use_random_queries = args.random_queries or (args.query is None)
|
||||
if use_random_queries:
|
||||
search_queries = DEFAULT_SEARCH_QUERIES.copy()
|
||||
random.shuffle(search_queries) # Shuffle for variety
|
||||
current_query_idx = 0
|
||||
query_display = f"cycling through {len(search_queries)} random queries"
|
||||
else:
|
||||
search_queries = [args.query]
|
||||
query_display = f"'{args.query}'"
|
||||
|
||||
# Read user agents
|
||||
user_agents = read_user_agents(args.user_agent_file)
|
||||
if not user_agents:
|
||||
print("No user agents found in file.", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
print(f"Testing {len(user_agents)} user agents against Google...", file=sys.stderr)
|
||||
print(f"Query: {query_display}", file=sys.stderr)
|
||||
if args.output:
|
||||
print(f"Output file: {args.output} (appending results incrementally)", file=sys.stderr)
|
||||
print(file=sys.stderr)
|
||||
|
||||
# Load existing working user agents from output file to avoid duplicates
|
||||
existing_working = set()
|
||||
if args.output:
|
||||
try:
|
||||
with open(args.output, 'r', encoding='utf-8') as f:
|
||||
existing_working = {line.strip() for line in f if line.strip()}
|
||||
if existing_working:
|
||||
print(f"Found {len(existing_working)} existing user agents in output file", file=sys.stderr)
|
||||
except FileNotFoundError:
|
||||
# File doesn't exist yet, that's fine
|
||||
pass
|
||||
except Exception as e:
|
||||
print(f"Warning: Could not read existing output file: {e}", file=sys.stderr)
|
||||
|
||||
# Open output file for incremental writing if specified (append mode)
|
||||
output_file = None
|
||||
if args.output:
|
||||
try:
|
||||
output_file = open(args.output, 'a', encoding='utf-8')
|
||||
except Exception as e:
|
||||
print(f"Error opening output file: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
working_agents = []
|
||||
failed_count = 0
|
||||
skipped_count = 0
|
||||
last_successful_idx = 0
|
||||
|
||||
try:
|
||||
for idx, ua in enumerate(user_agents, 1):
|
||||
# Skip testing if this UA is already in the working file
|
||||
if args.output and ua in existing_working:
|
||||
skipped_count += 1
|
||||
if args.verbose:
|
||||
print(f"[{idx}/{len(user_agents)}] ⊘ SKIPPED - Already in working file", file=sys.stderr)
|
||||
last_successful_idx = idx
|
||||
continue
|
||||
|
||||
try:
|
||||
# Get the next query (cycle through if using random queries)
|
||||
if use_random_queries:
|
||||
query = search_queries[current_query_idx % len(search_queries)]
|
||||
current_query_idx += 1
|
||||
else:
|
||||
query = args.query
|
||||
|
||||
is_working, reason = test_user_agent(ua, query, args.timeout)
|
||||
|
||||
if is_working:
|
||||
working_agents.append(ua)
|
||||
status = "✓"
|
||||
# Write immediately to output file if specified (skip if duplicate)
|
||||
if output_file:
|
||||
if ua not in existing_working:
|
||||
output_file.write(ua + '\n')
|
||||
output_file.flush() # Ensure it's written to disk
|
||||
existing_working.add(ua) # Track it to avoid duplicates
|
||||
else:
|
||||
if args.verbose:
|
||||
print(f"[{idx}/{len(user_agents)}] {status} WORKING (duplicate, skipped) - {reason}", file=sys.stderr)
|
||||
# Also print to stdout if no output file
|
||||
if not args.output:
|
||||
print(ua)
|
||||
|
||||
if args.verbose:
|
||||
print(f"[{idx}/{len(user_agents)}] {status} WORKING - {reason}", file=sys.stderr)
|
||||
else:
|
||||
failed_count += 1
|
||||
status = "✗"
|
||||
if args.verbose:
|
||||
print(f"[{idx}/{len(user_agents)}] {status} FAILED - {reason}", file=sys.stderr)
|
||||
|
||||
last_successful_idx = idx
|
||||
|
||||
# Progress indicator for non-verbose mode
|
||||
if not args.verbose and idx % 10 == 0:
|
||||
print(f"Progress: {idx}/{len(user_agents)} tested ({len(working_agents)} working, {failed_count} failed)", file=sys.stderr)
|
||||
|
||||
# Delay between requests to avoid rate limiting
|
||||
if idx < len(user_agents):
|
||||
time.sleep(args.delay)
|
||||
|
||||
except KeyboardInterrupt:
|
||||
print(file=sys.stderr)
|
||||
print(f"\nInterrupted by user at index {idx}/{len(user_agents)}", file=sys.stderr)
|
||||
print(f"Last successful test: {last_successful_idx}/{len(user_agents)}", file=sys.stderr)
|
||||
break
|
||||
except Exception as e:
|
||||
# Handle unexpected errors (like network issues or rate limits)
|
||||
error_msg = str(e)
|
||||
if "429" in error_msg or "Rate limited" in error_msg:
|
||||
print(file=sys.stderr)
|
||||
print(f"\n⚠️ RATE LIMIT DETECTED at index {idx}/{len(user_agents)}", file=sys.stderr)
|
||||
print(f"Last successful test: {last_successful_idx}/{len(user_agents)}", file=sys.stderr)
|
||||
print(f"Working user agents found so far: {len(working_agents)}", file=sys.stderr)
|
||||
if args.output:
|
||||
print(f"Results saved to: {args.output}", file=sys.stderr)
|
||||
print(f"\nTo resume later, you can skip the first {last_successful_idx} user agents.", file=sys.stderr)
|
||||
raise # Re-raise to exit the loop
|
||||
else:
|
||||
print(f"[{idx}/{len(user_agents)}] ERROR - {error_msg}", file=sys.stderr)
|
||||
failed_count += 1
|
||||
last_successful_idx = idx
|
||||
if idx < len(user_agents):
|
||||
time.sleep(args.delay)
|
||||
continue
|
||||
|
||||
finally:
|
||||
# Close output file if opened
|
||||
if output_file:
|
||||
output_file.close()
|
||||
|
||||
# Summary
|
||||
print(file=sys.stderr)
|
||||
tested_count = last_successful_idx - skipped_count
|
||||
print(f"Summary: {len(working_agents)} working, {failed_count} failed, {skipped_count} skipped out of {last_successful_idx} processed (of {len(user_agents)} total)", file=sys.stderr)
|
||||
if last_successful_idx < len(user_agents):
|
||||
print(f"Note: Processing stopped at index {last_successful_idx}. {len(user_agents) - last_successful_idx} user agents not processed.", file=sys.stderr)
|
||||
if args.output:
|
||||
print(f"Results saved to: {args.output}", file=sys.stderr)
|
||||
|
||||
return 0 if working_agents else 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
|
||||
210
misc/generate_uas.py
Executable file
210
misc/generate_uas.py
Executable file
|
|
@ -0,0 +1,210 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
Standalone Opera User Agent String Generator
|
||||
|
||||
This tool generates Opera-based User Agent strings that can be used with Whoogle.
|
||||
It can be run independently to generate and display UA strings on demand.
|
||||
|
||||
Usage:
|
||||
python misc/generate_uas.py [count]
|
||||
|
||||
Arguments:
|
||||
count: Number of UA strings to generate (default: 10)
|
||||
|
||||
Examples:
|
||||
python misc/generate_uas.py # Generate 10 UAs
|
||||
python misc/generate_uas.py 20 # Generate 20 UAs
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
|
||||
# Default fallback UA if generation fails
|
||||
DEFAULT_FALLBACK_UA = "Opera/9.30 (Nintendo Wii; U; ; 3642; en)"
|
||||
|
||||
# Try to import from the app module if available
|
||||
try:
|
||||
# Add parent directory to path to allow imports
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..'))
|
||||
from app.utils.ua_generator import generate_ua_pool
|
||||
USE_APP_MODULE = True
|
||||
except ImportError:
|
||||
USE_APP_MODULE = False
|
||||
# Self-contained version if app module is not available
|
||||
import random
|
||||
|
||||
# Opera UA Pattern Templates
|
||||
OPERA_PATTERNS = [
|
||||
"Opera/9.80 (J2ME/MIDP; Opera Mini/{version}/{build}; U; {lang}) Presto/{presto} Version/{final}",
|
||||
"Opera/9.80 (Android; Linux; Opera Mobi/{build}; U; {lang}) Presto/{presto} Version/{final}",
|
||||
"Opera/9.80 (iPhone; Opera Mini/{version}/{build}; U; {lang}) Presto/{presto} Version/{final}",
|
||||
"Opera/9.80 (iPad; Opera Mini/{version}/{build}; U; {lang}) Presto/{presto} Version/{final}",
|
||||
"Opera/9.30 (Nintendo Wii; U; ; {code}; {lang})",
|
||||
"Opera/9.80 (S60; SymbOS; Opera Mobi/{build}; U; {lang}) Presto/{presto} Version/{final}",
|
||||
"Opera/9.80 (Series 60; Opera Mini/{version}/{build}; U; {lang}) Presto/{presto} Version/{final}",
|
||||
"Opera/9.80 (BlackBerry; Opera Mini/{version}/{build}; U; {lang}) Presto/{presto} Version/{final}",
|
||||
"Opera/9.80 (Windows Mobile; Opera Mini/{version}/{build}; U; {lang}) Presto/{presto} Version/{final}",
|
||||
]
|
||||
|
||||
OPERA_MINI_VERSIONS = [
|
||||
"4.0", "4.1.11321", "4.2.13337", "4.2.14912", "4.2.15410", "4.3.24214",
|
||||
"5.0.18741", "5.1.22296", "5.1.22783", "6.0.24095", "6.24093", "7.1.32444",
|
||||
"7.6.35766", "36.2.2254"
|
||||
]
|
||||
|
||||
OPERA_MOBI_BUILDS = [
|
||||
"27", "49", "447", "1209", "3730", "ADR-1012221546", "SYB-1107071606"
|
||||
]
|
||||
|
||||
BUILD_NUMBERS = [
|
||||
"22.387", "22.478", "23.334", "23.377", "24.746", "24.783", "25.657",
|
||||
"27.1407", "28.2647", "35.5706", "119.132", "870", "886"
|
||||
]
|
||||
|
||||
PRESTO_VERSIONS = [
|
||||
"2.4.15", "2.4.18", "2.5.25", "2.8.119", "2.12.423"
|
||||
]
|
||||
|
||||
FINAL_VERSIONS = [
|
||||
"10.00", "10.1", "10.54", "11.10", "12.16", "13.00"
|
||||
]
|
||||
|
||||
LANGUAGES = [
|
||||
# English variants
|
||||
"en", "en-US", "en-GB", "en-CA", "en-AU", "en-NZ", "en-ZA", "en-IN", "en-SG",
|
||||
# Western European
|
||||
"de", "de-DE", "de-AT", "de-CH",
|
||||
"fr", "fr-FR", "fr-CA", "fr-BE", "fr-CH", "fr-LU",
|
||||
"es", "es-ES", "es-MX", "es-AR", "es-CO", "es-CL", "es-PE", "es-VE", "es-LA",
|
||||
"it", "it-IT", "it-CH",
|
||||
"pt", "pt-PT", "pt-BR",
|
||||
"nl", "nl-NL", "nl-BE",
|
||||
# Nordic languages
|
||||
"da", "da-DK",
|
||||
"sv", "sv-SE",
|
||||
"no", "no-NO", "nb", "nn",
|
||||
"fi", "fi-FI",
|
||||
"is", "is-IS",
|
||||
# Eastern European
|
||||
"pl", "pl-PL",
|
||||
"cs", "cs-CZ",
|
||||
"sk", "sk-SK",
|
||||
"hu", "hu-HU",
|
||||
"ro", "ro-RO",
|
||||
"bg", "bg-BG",
|
||||
"hr", "hr-HR",
|
||||
"sr", "sr-RS",
|
||||
"sl", "sl-SI",
|
||||
"uk", "uk-UA",
|
||||
"ru", "ru-RU",
|
||||
# Asian languages
|
||||
"zh", "zh-CN", "zh-TW", "zh-HK",
|
||||
"ja", "ja-JP",
|
||||
"ko", "ko-KR",
|
||||
"th", "th-TH",
|
||||
"vi", "vi-VN",
|
||||
"id", "id-ID",
|
||||
"ms", "ms-MY",
|
||||
"fil", "tl",
|
||||
# Middle Eastern
|
||||
"tr", "tr-TR",
|
||||
"ar", "ar-SA", "ar-AE", "ar-EG",
|
||||
"he", "he-IL",
|
||||
"fa", "fa-IR",
|
||||
# Other
|
||||
"hi", "hi-IN",
|
||||
"bn", "bn-IN",
|
||||
"ta", "ta-IN",
|
||||
"te", "te-IN",
|
||||
"mr", "mr-IN",
|
||||
"el", "el-GR",
|
||||
"ca", "ca-ES",
|
||||
"eu", "eu-ES"
|
||||
]
|
||||
|
||||
WII_CODES = [
|
||||
"1038-58", "1621", "2047-7", "2077-4", "3642"
|
||||
]
|
||||
|
||||
def generate_opera_ua():
|
||||
"""Generate a single random Opera User Agent string."""
|
||||
pattern = random.choice(OPERA_PATTERNS)
|
||||
params = {'lang': random.choice(LANGUAGES)}
|
||||
|
||||
if "Nintendo Wii" in pattern:
|
||||
params['code'] = random.choice(WII_CODES)
|
||||
else:
|
||||
if '{version}' in pattern:
|
||||
params['version'] = random.choice(OPERA_MINI_VERSIONS)
|
||||
if '{build}' in pattern:
|
||||
if "Opera Mobi" in pattern:
|
||||
params['build'] = random.choice(OPERA_MOBI_BUILDS)
|
||||
else:
|
||||
params['build'] = random.choice(BUILD_NUMBERS)
|
||||
if '{presto}' in pattern:
|
||||
params['presto'] = random.choice(PRESTO_VERSIONS)
|
||||
if '{final}' in pattern:
|
||||
params['final'] = random.choice(FINAL_VERSIONS)
|
||||
|
||||
return pattern.format(**params)
|
||||
|
||||
def generate_ua_pool(count=10):
|
||||
"""Generate a pool of unique Opera User Agent strings."""
|
||||
ua_pool = set()
|
||||
max_attempts = count * 100
|
||||
attempts = 0
|
||||
|
||||
try:
|
||||
while len(ua_pool) < count and attempts < max_attempts:
|
||||
ua = generate_opera_ua()
|
||||
ua_pool.add(ua)
|
||||
attempts += 1
|
||||
except Exception:
|
||||
# If generation fails entirely, return at least the default fallback
|
||||
if not ua_pool:
|
||||
return [DEFAULT_FALLBACK_UA]
|
||||
|
||||
# If we couldn't generate enough, fill remaining with default
|
||||
result = list(ua_pool)
|
||||
while len(result) < count:
|
||||
result.append(DEFAULT_FALLBACK_UA)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def main():
|
||||
"""Main function to generate and display UA strings."""
|
||||
# Parse command line argument
|
||||
count = 10 # Default
|
||||
if len(sys.argv) > 1:
|
||||
try:
|
||||
count = int(sys.argv[1])
|
||||
if count < 1:
|
||||
print("Error: Count must be a positive integer", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
except ValueError:
|
||||
print(f"Error: Invalid count '{sys.argv[1]}'. Must be an integer.", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
# Show which mode we're using (to stderr so it doesn't interfere with output)
|
||||
if USE_APP_MODULE:
|
||||
print(f"# Using app.utils.ua_generator module", file=sys.stderr)
|
||||
else:
|
||||
print(f"# Using standalone generator (app module not available)", file=sys.stderr)
|
||||
|
||||
print(f"# Generating {count} Opera User Agent strings...\n", file=sys.stderr)
|
||||
|
||||
# Generate UAs
|
||||
uas = generate_ua_pool(count)
|
||||
|
||||
# Display them (one per line, no numbering)
|
||||
for ua in uas:
|
||||
print(ua)
|
||||
|
||||
# Summary to stderr so it doesn't interfere with piping
|
||||
print(f"\n# Generated {len(uas)} unique User Agent strings", file=sys.stderr)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
|
||||
|
|
@ -1,5 +1,8 @@
|
|||
from app import app
|
||||
from app.request import Request
|
||||
from app.utils.session import generate_key
|
||||
from test.mock_google import build_mock_response
|
||||
import httpx
|
||||
import pytest
|
||||
import random
|
||||
|
||||
|
|
@ -13,6 +16,38 @@ demo_config = {
|
|||
}
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def mock_google(monkeypatch):
|
||||
original_send = Request.send
|
||||
|
||||
def fake_send(self, base_url='', query='', attempt=0,
|
||||
force_mobile=False, user_agent=''):
|
||||
use_mock = not base_url or 'google.com/search' in base_url
|
||||
if not use_mock:
|
||||
return original_send(self, base_url, query, attempt,
|
||||
force_mobile, user_agent)
|
||||
|
||||
html = build_mock_response(query, getattr(self, 'language', ''), getattr(self, 'country', ''))
|
||||
request_url = (base_url or self.search_url) + query
|
||||
request = httpx.Request('GET', request_url)
|
||||
return httpx.Response(200, request=request, text=html)
|
||||
|
||||
def fake_autocomplete(self, q):
|
||||
normalized = q.replace('+', ' ').lower()
|
||||
suggestions = []
|
||||
if 'green eggs and' in normalized:
|
||||
suggestions.append('green eggs and ham')
|
||||
if 'the cat in the' in normalized:
|
||||
suggestions.append('the cat in the hat')
|
||||
if normalized.startswith('who'):
|
||||
suggestions.extend(['whoogle', 'whoogle search'])
|
||||
return suggestions
|
||||
|
||||
monkeypatch.setattr(Request, 'send', fake_send)
|
||||
monkeypatch.setattr(Request, 'autocomplete', fake_autocomplete)
|
||||
yield
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def client():
|
||||
with app.test_client() as client:
|
||||
|
|
|
|||
136
test/mock_google.py
Normal file
136
test/mock_google.py
Normal file
|
|
@ -0,0 +1,136 @@
|
|||
from urllib.parse import parse_qs, unquote, quote
|
||||
|
||||
from app.models.config import Config
|
||||
|
||||
DEFAULT_RESULTS = [
|
||||
('Example Domain', 'https://example.com/{slug}', 'Example information about {term}.'),
|
||||
('Whoogle Search', 'https://github.com/benbusby/whoogle-search', 'Private self-hosted Google proxy'),
|
||||
('Wikipedia', 'https://en.wikipedia.org/wiki/{title}', '{title} – encyclopedia entry.'),
|
||||
]
|
||||
|
||||
|
||||
def _result_block(title, href, snippet):
|
||||
encoded_href = quote(href, safe=':/')
|
||||
return (
|
||||
f'<div class="ZINbbc xpd O9g5cc uUPGi">'
|
||||
f'<div class="kCrYT">'
|
||||
f'<a href="/url?q={encoded_href}&sa=U&ved=2ahUKE">'
|
||||
f'<h3 class="BNeawe vvjwJb AP7Wnd">{title}</h3>'
|
||||
f'<span class="CVA68e">{title}</span>'
|
||||
f'</a>'
|
||||
f'<div class="VwiC3b">{snippet}</div>'
|
||||
f'</div>'
|
||||
f'</div>'
|
||||
)
|
||||
|
||||
|
||||
def _main_results(query, params, language='', country=''):
|
||||
term = query.lower()
|
||||
slug = query.replace(' ', '-')
|
||||
results = []
|
||||
|
||||
pref_lang = ''
|
||||
pref_country = ''
|
||||
if 'preferences' in params:
|
||||
try:
|
||||
pref_data = Config(**{})._decode_preferences(params['preferences'][0])
|
||||
pref_lang = str(pref_data.get('lang_interface', '') or '').lower()
|
||||
pref_country = str(pref_data.get('country', '') or '').lower()
|
||||
except Exception:
|
||||
pref_lang = pref_country = ''
|
||||
else:
|
||||
pref_lang = pref_country = ''
|
||||
|
||||
if 'wikipedia' in term:
|
||||
hl = str(params.get('hl', [''])[0] or '').lower()
|
||||
gl = str(params.get('gl', [''])[0] or '').lower()
|
||||
lr = str(params.get('lr', [''])[0] or '').lower()
|
||||
language_code = str(language or '').lower()
|
||||
country_code = str(country or '').lower()
|
||||
is_japanese = (
|
||||
hl.startswith('ja') or
|
||||
gl.startswith('jp') or
|
||||
lr.endswith('lang_ja') or
|
||||
language_code.endswith('lang_ja') or
|
||||
country_code.startswith('jp') or
|
||||
pref_lang.endswith('lang_ja') or
|
||||
pref_country.startswith('jp')
|
||||
)
|
||||
if is_japanese:
|
||||
results.append((
|
||||
'ウィキペディア',
|
||||
'https://ja.wikipedia.org/wiki/ウィキペディア',
|
||||
'日本語版ウィキペディアの記事です。'
|
||||
))
|
||||
else:
|
||||
results.append((
|
||||
'Wikipedia',
|
||||
'https://www.wikipedia.org/wiki/Wikipedia',
|
||||
'Wikipedia is a free online encyclopedia.'
|
||||
))
|
||||
|
||||
if 'pinterest' in term:
|
||||
results.append((
|
||||
'Pinterest',
|
||||
'https://www.pinterest.com/ideas/',
|
||||
'Discover recipes, home ideas, style inspiration and other ideas.'
|
||||
))
|
||||
|
||||
if 'whoogle' in term:
|
||||
results.append((
|
||||
'Whoogle Search GitHub',
|
||||
'https://github.com/benbusby/whoogle-search',
|
||||
'Source code for Whoogle Search.'
|
||||
))
|
||||
|
||||
if 'github' in term:
|
||||
results.append((
|
||||
'GitHub',
|
||||
f'https://github.com/search?q={slug}',
|
||||
'GitHub is a development platform to host and review code.'
|
||||
))
|
||||
|
||||
for title, url, snippet in DEFAULT_RESULTS:
|
||||
formatted_url = url.format(slug=slug, term=term, title=title.replace(' ', '_'))
|
||||
formatted_snippet = snippet.format(term=query, title=title)
|
||||
results.append((title, formatted_url, formatted_snippet))
|
||||
|
||||
unique = []
|
||||
seen = set()
|
||||
for entry in results:
|
||||
if entry[1] in seen:
|
||||
continue
|
||||
seen.add(entry[1])
|
||||
unique.append(entry)
|
||||
|
||||
return ''.join(_result_block(*entry) for entry in unique)
|
||||
|
||||
|
||||
def build_mock_response(raw_query, language='', country=''):
|
||||
if '&' in raw_query:
|
||||
q_part, extra = raw_query.split('&', 1)
|
||||
else:
|
||||
q_part, extra = raw_query, ''
|
||||
|
||||
query = unquote(q_part)
|
||||
params = parse_qs(extra)
|
||||
|
||||
results_html = _main_results(query, params, language, country)
|
||||
safe_query = query.replace('"', '')
|
||||
pagination = (
|
||||
f'<a href="/search?q={q_part}&start=10">Next</a>'
|
||||
f'<a href="/search?q={q_part}&start=20">More</a>'
|
||||
)
|
||||
|
||||
return (
|
||||
'<html>'
|
||||
'<head><title>Mock Google Results</title></head>'
|
||||
'<body>'
|
||||
f'<div id="main">{results_html}</div>'
|
||||
f'<form action="/search" method="GET">'
|
||||
f'<input name="q" value="{safe_query}">'
|
||||
'</form>'
|
||||
f'<footer class="TuS8Ad">{pagination}</footer>'
|
||||
'</body>'
|
||||
'</html>'
|
||||
)
|
||||
Loading…
Reference in a new issue