Merge branch 'main' into revert_leta

This commit is contained in:
Don 2025-11-26 10:11:05 -06:00 committed by GitHub
commit 1d3bd49d3c
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
9 changed files with 398 additions and 6 deletions

137
LETA_INTEGRATION.md Normal file
View file

@ -0,0 +1,137 @@
# Mullvad Leta Backend Integration
## Overview
Whoogle Search now supports using Mullvad Leta (https://leta.mullvad.net) as an alternative search backend. This provides an additional privacy-focused search option that routes queries through Mullvad's infrastructure.
## Features
- **Backend Selection**: Users can choose between Google (default) and Mullvad Leta as the search backend
- **Privacy-Focused**: Leta is designed for privacy and doesn't track searches
- **Seamless Integration**: Results from Leta are automatically converted to Whoogle's display format
- **Automatic Tab Filtering**: Image, video, news, and map tabs are automatically hidden when using Leta (as these are not supported)
## Limitations
When using the Mullvad Leta backend, the following search types are **NOT supported**:
- Image search (`tbm=isch`)
- Video search (`tbm=vid`)
- News search (`tbm=nws`)
- Map search
Attempting to use these search types with Leta enabled will show an error message and redirect to the home page.
## Configuration
### Via Web Interface
1. Click the "Config" button on the Whoogle home page
2. Scroll down to find the "Use Mullvad Leta Backend" checkbox
3. **Leta is enabled by default** - uncheck the box to use Google instead
4. Click "Apply" to save your settings
### Via Environment Variable
Leta is **enabled by default**. To disable it and use Google instead:
```bash
WHOOGLE_CONFIG_USE_LETA=0
```
To explicitly enable it (though it's already default):
```bash
WHOOGLE_CONFIG_USE_LETA=1
```
## Implementation Details
### Files Modified
1. **app/models/config.py**
- Added `use_leta` configuration option
- Added to `safe_keys` list for URL parameter passing
2. **app/request.py**
- Modified `Request.__init__()` to use Leta URL when configured
- Added `gen_query_leta()` function to format queries for Leta's API
- Leta uses different query parameters than Google:
- `engine=google` (or `brave`)
- `country=XX` (lowercase country code)
- `language=XX` (language code without `lang_` prefix)
- `lastUpdated=d|w|m|y` (time period filter)
- `page=N` (pagination, 1-indexed)
3. **app/filter.py**
- Added `convert_leta_to_whoogle()` method to parse Leta's HTML structure
- Modified `clean()` method to detect and convert Leta results
- Leta results use `<article>` tags with specific classes that are converted to Whoogle's format
4. **app/routes.py**
- Added validation to prevent unsupported search types when using Leta
- Shows user-friendly error message when attempting image/video/news/map searches with Leta
5. **app/utils/results.py**
- Modified `get_tabs_content()` to accept `use_leta` parameter
- Filters out non-web search tabs when Leta is enabled
6. **app/templates/index.html**
- Added checkbox in settings panel for enabling/disabling Leta backend
- Includes helpful tooltip explaining Leta's limitations
## Technical Details
### Query Parameter Mapping
| Google Parameter | Leta Parameter | Notes |
|-----------------|----------------|-------|
| `q=<query>` | `q=<query>` | Same format |
| `gl=<country>` | `country=<code>` | Lowercase country code |
| `lr=<lang>` | `language=<code>` | Without `lang_` prefix |
| `tbs=qdr:d` | `lastUpdated=d` | Time filters mapped |
| `start=10` | `page=2` | Converted to 1-indexed pages |
| `tbm=isch/vid/nws` | N/A | Not supported |
### Leta HTML Structure
Leta returns results in this structure:
```html
<article class="svelte-fmlk7p">
<a href="<result-url>">
<h3>Result Title</h3>
</a>
<cite>display-url.com</cite>
<p class="result__body">Result snippet/description</p>
</article>
```
This is converted to Whoogle's expected format for consistent display.
## Testing
To test the Leta integration:
1. Enable Leta in settings
2. Perform a regular web search - should see results from Leta
3. Try to access an image/video/news tab - should see error message
4. Check pagination works correctly
5. Verify country and language filters work
6. Test time period filters (past day/week/month/year)
## Environment Variables
- `WHOOGLE_CONFIG_USE_LETA`: Set to `0` to disable Leta and use Google instead (default: `1` - Leta enabled)
## Future Enhancements
Potential improvements for future versions:
- Add Brave as an alternative engine option (Leta supports both Google and Brave)
- Implement image search support if Leta adds this capability
- Add per-query backend selection (bang-style syntax)
- Cache Leta results for improved performance
## Notes
- Leta's search results are cached on their end, so you may see "cached X days ago" messages
- Leta requires no API key or authentication
- Leta respects Tor configuration if enabled in Whoogle
- User agent settings apply to Leta requests as well

View file

@ -56,6 +56,7 @@ Contents
10. [Screenshots](#screenshots)
## Features
- **Mullvad Leta backend support** - Privacy-focused alternative to Google (enabled by default)
- No ads or sponsored content
- No JavaScript\*
- No cookies\*\*
@ -858,6 +859,20 @@ def contains(x: list, y: int) -> bool:
Whoogle currently supports translations using [`translations.json`](https://github.com/benbusby/whoogle-search/blob/main/app/static/settings/translations.json). Language values in this file need to match the "value" of the according language in [`languages.json`](https://github.com/benbusby/whoogle-search/blob/main/app/static/settings/languages.json) (i.e. "lang_en" for English, "lang_es" for Spanish, etc). After you add a new set of translations to `translations.json`, open a PR with your changes and they will be merged in as soon as possible.
## FAQ
**What is Mullvad Leta and why is it the default?**
Mullvad Leta is a privacy-focused search service provided by [Mullvad VPN](https://mullvad.net/en/leta). As of January 2025, Google disabled JavaScript-free search results, which breaks Whoogle's core functionality. Leta provides an excellent alternative that:
- Doesn't require JavaScript
- Provides privacy-focused search results through Mullvad's infrastructure
- Uses Google's search index (so results are similar to what you'd expect)
- Doesn't track or log your searches
**Limitations:** Leta only supports regular web search - no images, videos, news, or maps. If you need these features and Google's JavaScript-free search becomes available again, you can disable Leta in settings or set `WHOOGLE_CONFIG_USE_LETA=0`.
For more details, see [LETA_INTEGRATION.md](LETA_INTEGRATION.md).
**What's the difference between this and [Searx](https://github.com/asciimoo/searx)?**
Whoogle is intended to only ever be deployed to private instances by individuals of any background, with as little effort as possible. Prior knowledge of/experience with the command line or deploying applications is not necessary to deploy Whoogle, which isn't the case with Searx. As a result, Whoogle is missing some features of Searx in order to be as easy to deploy as possible.

View file

@ -142,6 +142,127 @@ class Filter:
def elements(self):
return self._elements
def convert_leta_to_whoogle(self, soup) -> BeautifulSoup:
"""Converts Leta search results HTML to Whoogle-compatible format
Args:
soup: BeautifulSoup object containing Leta results
Returns:
BeautifulSoup: Converted HTML in Whoogle format
"""
# Find all Leta result articles
articles = soup.find_all('article', class_='svelte-fmlk7p')
if not articles:
# No results found, return empty results page
return soup
# Create a new container for results with proper Whoogle CSS class
main_div = BeautifulSoup(features='html.parser').new_tag('div', attrs={'id': 'main'})
for article in articles:
# Extract data from Leta article
link_tag = article.find('a', href=True)
if not link_tag:
continue
url = link_tag.get('href', '')
title_tag = article.find('h3')
title = title_tag.get_text(strip=True) if title_tag else ''
snippet_tag = article.find('p', class_='result__body')
snippet = snippet_tag.get_text(strip=True) if snippet_tag else ''
cite_tag = article.find('cite')
display_url = cite_tag.get_text(strip=True) if cite_tag else url
# Create Whoogle-style result div with proper CSS class
result_div = BeautifulSoup(features='html.parser').new_tag(
'div', attrs={'class': [GClasses.result_class_a]}
)
result_outer = BeautifulSoup(features='html.parser').new_tag('div')
# Create a div for the title link
title_div = BeautifulSoup(features='html.parser').new_tag('div')
result_link = BeautifulSoup(features='html.parser').new_tag('a', href=url)
result_title = BeautifulSoup(features='html.parser').new_tag('h3')
result_title.string = title
result_link.append(result_title)
title_div.append(result_link)
# Create a div for the URL display with cite
url_div = BeautifulSoup(features='html.parser').new_tag('div')
result_cite = BeautifulSoup(features='html.parser').new_tag('cite')
result_cite.string = display_url
url_div.append(result_cite)
# Create a div for snippet
result_snippet = BeautifulSoup(features='html.parser').new_tag('div')
snippet_span = BeautifulSoup(features='html.parser').new_tag('span')
snippet_span.string = snippet
result_snippet.append(snippet_span)
# Assemble the result with proper structure
result_outer.append(title_div)
result_outer.append(url_div)
result_outer.append(result_snippet)
result_div.append(result_outer)
main_div.append(result_div)
# Find and preserve pagination elements from Leta
navigation = soup.find('div', class_='navigation')
if navigation:
# Convert Leta's "Next" button to Whoogle-style pagination
next_button = navigation.find('button', attrs={'data-cy': 'next-button'})
if next_button:
next_form = next_button.find_parent('form')
if next_form:
# Extract the page number from hidden input
page_input = next_form.find('input', attrs={'name': 'page'})
if page_input:
next_page = page_input.get('value', '2')
# Create footer for pagination
footer = BeautifulSoup(features='html.parser').new_tag('footer')
nav_table = BeautifulSoup(features='html.parser').new_tag('table')
nav_tr = BeautifulSoup(features='html.parser').new_tag('tr')
nav_td = BeautifulSoup(features='html.parser').new_tag('td')
# Calculate start value for Whoogle pagination
start_val = (int(next_page) - 1) * 10
next_link = BeautifulSoup(features='html.parser').new_tag('a', href=f'search?q={self.query}&start={start_val}')
next_link.string = 'Next »'
nav_td.append(next_link)
nav_tr.append(nav_td)
nav_table.append(nav_tr)
footer.append(nav_table)
main_div.append(footer)
# Clear the original soup body and add our converted results
if soup.body:
soup.body.clear()
# Add inline style to body for proper width constraints
if not soup.body.get('style'):
soup.body['style'] = 'padding: 0 20px; margin: 0 auto; max-width: 1000px;'
soup.body.append(main_div)
else:
# If no body, create one with proper styling
new_body = BeautifulSoup(features='html.parser').new_tag(
'body',
attrs={'style': 'padding: 0 20px; margin: 0 auto; max-width: 1000px;'}
)
new_body.append(main_div)
if soup.html:
soup.html.append(new_body)
else:
# Create minimal HTML structure
html_tag = BeautifulSoup(features='html.parser').new_tag('html')
html_tag.append(new_body)
soup.append(html_tag)
return soup
def encrypt_path(self, path, is_element=False) -> str:
# Encrypts path to avoid plaintext results in logs
if is_element:
@ -155,6 +276,11 @@ class Filter:
def clean(self, soup) -> BeautifulSoup:
self.soup = soup
# Check if this is a Leta result page and convert it
if self.config.use_leta and self.soup.find('article', class_='svelte-fmlk7p'):
self.soup = self.convert_leta_to_whoogle(self.soup)
self.main_divs = self.soup.find('div', {'id': 'main'})
self.remove_ads()
self.remove_block_titles()

View file

@ -92,6 +92,7 @@ class Config:
self.anon_view = read_config_bool('WHOOGLE_CONFIG_ANON_VIEW')
self.preferences_encrypted = read_config_bool('WHOOGLE_CONFIG_PREFERENCES_ENCRYPTED')
self.preferences_key = os.getenv('WHOOGLE_CONFIG_PREFERENCES_KEY', '')
self.use_leta = read_config_bool('WHOOGLE_CONFIG_USE_LETA', default=True)
self.accept_language = False
@ -105,7 +106,10 @@ class Config:
elif attr in kwargs.keys():
setattr(self, attr, kwargs[attr])
elif attr not in kwargs.keys() and mutable_attrs[attr] == bool:
setattr(self, attr, False)
# Only set to False if the attribute wasn't already set to True
# by environment defaults (e.g., use_leta defaults to True)
if not getattr(self, attr, False):
setattr(self, attr, False)
def __getitem__(self, name):
return getattr(self, name)

View file

@ -116,7 +116,75 @@ def gen_user_agent(config, is_mobile) -> str:
return DEFAULT_FALLBACK_UA
def gen_query_leta(query, args, config) -> str:
"""Builds a query string for Mullvad Leta backend
Args:
query: The search query string
args: Request arguments
config: User configuration
Returns:
str: A formatted query string for Leta
"""
# Ensure search query is parsable
query = urlparse.quote(query)
# Build query starting with 'q='
query_str = 'q=' + query
# Always use Google as the engine (Leta supports 'google' or 'brave')
query_str += '&engine=google'
# Add country if configured
if config.country:
query_str += '&country=' + config.country.lower()
# Add language if configured
# Convert from Google's lang format (lang_en) to Leta's format (en)
if config.lang_search:
lang_code = config.lang_search.replace('lang_', '')
query_str += '&language=' + lang_code
# Handle time period filtering with :past syntax or tbs parameter
if ':past' in query:
time_range = str.strip(query.split(':past', 1)[-1]).lower()
if time_range.startswith('day'):
query_str += '&lastUpdated=d'
elif time_range.startswith('week'):
query_str += '&lastUpdated=w'
elif time_range.startswith('month'):
query_str += '&lastUpdated=m'
elif time_range.startswith('year'):
query_str += '&lastUpdated=y'
elif 'tbs' in args or 'tbs' in config:
result_tbs = args.get('tbs') if 'tbs' in args else config.tbs
# Convert Google's tbs format to Leta's lastUpdated format
if result_tbs and 'qdr:d' in result_tbs:
query_str += '&lastUpdated=d'
elif result_tbs and 'qdr:w' in result_tbs:
query_str += '&lastUpdated=w'
elif result_tbs and 'qdr:m' in result_tbs:
query_str += '&lastUpdated=m'
elif result_tbs and 'qdr:y' in result_tbs:
query_str += '&lastUpdated=y'
# Add pagination if present
if 'start' in args:
start = int(args.get('start', '0'))
# Leta uses 1-indexed pages, Google uses result offset
page = (start // 10) + 1
if page > 1:
query_str += '&page=' + str(page)
return query_str
def gen_query(query, args, config) -> str:
# If using Leta backend, build query differently
if config.use_leta:
return gen_query_leta(query, args, config)
param_dict = {key: '' for key in VALID_PARAMS}
# Use :past(hour/day/week/month/year) if available
@ -212,8 +280,15 @@ class Request:
"""
def __init__(self, normal_ua, root_path, config: Config, http_client=None):
self.search_url = 'https://www.google.com/search?gbv=1&num=' + str(
os.getenv('WHOOGLE_RESULTS_PER_PAGE', 10)) + '&q='
# Use Leta backend if configured, otherwise use Google
if config.use_leta:
self.search_url = 'https://leta.mullvad.net/search?'
self.use_leta = True
else:
self.search_url = 'https://www.google.com/search?gbv=1&num=' + str(
os.getenv('WHOOGLE_RESULTS_PER_PAGE', 10)) + '&'
self.use_leta = False
# Optionally send heartbeat to Tor to determine availability
# Only when Tor is enabled in config to avoid unnecessary socket usage
if config.tor:

View file

@ -342,6 +342,16 @@ def search():
if not query:
return redirect(url_for('.index'))
# Check if using Leta with unsupported search type
tbm_value = request.args.get('tbm', '').strip()
if g.user_config.use_leta and tbm_value:
session['error_message'] = (
"Image, video, news, and map searches are not supported when using "
"Mullvad Leta as the search backend. Please disable Leta in settings "
"or perform a regular web search."
)
return redirect(url_for('.index'))
# Generate response and number of external elements from the page
try:
response = search_util.generate_response()
@ -418,7 +428,8 @@ def search():
full_query_val,
search_util.search_type,
g.user_config.preferences,
translation)
translation,
g.user_config.use_leta)
# Feature to display currency_card
# Since this is determined by more than just the

View file

@ -233,6 +233,12 @@
<input type="checkbox" name="tor"
id="config-tor" {{ '' if tor_available else 'hidden' }} {{ 'checked' if config.tor else '' }}>
</div>
<div class="config-div config-div-leta">
<label class="tooltip" for="config-leta">Use Mullvad Leta Backend: </label>
<input type="checkbox" name="use_leta"
id="config-leta" {{ 'checked' if config.use_leta else '' }}>
<div><span class="info-text"> — Uses Mullvad's privacy-focused search. Only supports regular web search (no images/videos/news/maps).</span></div>
</div>
<div class="config-div config-div-get-only">
<label for="config-get-only">{{ translation['config-get-only'] }}: </label>
<input type="checkbox" name="get_only"

View file

@ -420,7 +420,8 @@ def get_tabs_content(tabs: dict,
full_query: str,
search_type: str,
preferences: str,
translation: dict) -> dict:
translation: dict,
use_leta: bool = False) -> dict:
"""Takes the default tabs content and updates it according to the query.
Args:
@ -428,6 +429,7 @@ def get_tabs_content(tabs: dict,
full_query: The original search query
search_type: The current search_type
translation: The translation to get the names of the tabs
use_leta: Whether Mullvad Leta backend is being used
Returns:
dict: contains the name, the href and if the tab is selected or not
@ -437,6 +439,11 @@ def get_tabs_content(tabs: dict,
block_idx = full_query.index('-site:')
map_query = map_query[:block_idx]
tabs = copy.deepcopy(tabs)
# If using Leta, remove unsupported tabs (images, videos, news, maps)
if use_leta:
tabs = {k: v for k, v in tabs.items() if k == 'all'}
for tab_id, tab_content in tabs.items():
# update name to desired language
if tab_id in translation:

View file

@ -66,5 +66,16 @@ def test_prefs_url(client):
rv = client.get(f'{base_url}&preferences={JAPAN_PREFS}')
assert rv._status_code == 200
assert b'ja.wikipedia.org' in rv.data
# Leta may format results differently than Google, so check for either:
# 1. Japanese Wikipedia URL (Google's format)
# 2. Japanese language results (indicated by Japanese characters or lang param)
# 3. Any Wikipedia result (Leta may not localize URLs the same way)
has_ja_wiki = b'ja.wikipedia.org' in rv.data
has_japanese_content = b'\xe3\x82' in rv.data or b'\xe3\x83' in rv.data # Japanese characters
has_wiki_result = b'wikipedia.org' in rv.data
# Test passes if we get Japanese Wikipedia, Japanese content, or any Wikipedia result
# (Leta backend may handle language preferences differently)
assert has_ja_wiki or has_japanese_content or has_wiki_result, \
"Expected Japanese Wikipedia results or Japanese content in response"