Updates to PR review process

This commit is contained in:
Alicia Sykes 2026-03-08 00:01:29 +00:00
parent 01be5b01f9
commit d31dcfe05f
9 changed files with 908 additions and 5700 deletions

View file

@ -31,6 +31,7 @@ Your request will be reviewed, then either merged, or have changes requested, or
- To make layout or stylistic edits to the site ([awesome-privacy.xyz](https://awesome-privacy.xyz)), see the [Website docs](https://github.com/Lissy93/awesome-privacy#the-website) in the readme for build and running instructions. - To make layout or stylistic edits to the site ([awesome-privacy.xyz](https://awesome-privacy.xyz)), see the [Website docs](https://github.com/Lissy93/awesome-privacy#the-website) in the readme for build and running instructions.
- To make edits to the API ([api.awesome-privacy.xyz](http://api.awesome-privacy.xyz/)), see the [API docs](https://github.com/Lissy93/awesome-privacy#the-api) in the readme for build and running instructions. - To make edits to the API ([api.awesome-privacy.xyz](http://api.awesome-privacy.xyz/)), see the [API docs](https://github.com/Lissy93/awesome-privacy#the-api) in the readme for build and running instructions.
- To make changes to the automations (which validate, process and insert the data), see the [lib](https://github.com/Lissy93/awesome-privacy/blob/main/lib) directory
--- ---
@ -41,7 +42,7 @@ For software to be included in this list, it must meet the following requirement
- **Privacy Respecting** - **Privacy Respecting**
- The project must respect users privacy, not collect more data than necessary, and store info securely - The project must respect users privacy, not collect more data than necessary, and store info securely
- For hosted services, the project must have a clear privacy policy - For hosted services, the project must have a clear privacy policy
- The user must remain in full control of their data, and be able to delete it at any time - The user must remain in full control of their data, and be able to export and delete it at any time
- **Secure** - **Secure**
- The software must be secure by default, without requiring additional configuration - The software must be secure by default, without requiring additional configuration
- There should be no current, critical security issues - There should be no current, critical security issues
@ -50,6 +51,7 @@ For software to be included in this list, it must meet the following requirement
- The full source code should be released under an open source license - The full source code should be released under an open source license
- Ideally it should be possible for the user to build and run/deploy the software themselves from source - Ideally it should be possible for the user to build and run/deploy the software themselves from source
- **Actively Maintained** - **Actively Maintained**
- The project must not be abandoned or severely out-dated
- The developers should address dependency updates and security patches in a timely manner - The developers should address dependency updates and security patches in a timely manner
- **Transparent** - **Transparent**
- It should be clear who is behind the project, what their motives are, and what (if any) the funding model is - It should be clear who is behind the project, what their motives are, and what (if any) the funding model is
@ -65,7 +67,7 @@ For software to be included in this list, it must meet the following requirement
- **Mature** - **Mature**
- Software needs to have a proven track record of commitment to maintenance - Software needs to have a proven track record of commitment to maintenance
- Repositories must not be newly created, and the first stable release older than 4 months - Repositories must not be newly created, and the first stable release older than 4 months
- Projects primarily written with AI or vibe coded are not suitable for listing here - Projects that are largely AI/autogenerated without meaningful review or maintainership are not suitable for listing here
_There may be some exceptions, but these would need to be fully justified, reviewed _There may be some exceptions, but these would need to be fully justified, reviewed
by the community, and the drawbacks / anti-features must be clearly listed along-side the software. by the community, and the drawbacks / anti-features must be clearly listed along-side the software.
@ -80,21 +82,16 @@ Your pull request must follow these requirements. Failure to do so, might result
- Do not edit the README directly when adding / editing a listing (it's auto-generated!) - Do not edit the README directly when adding / editing a listing (it's auto-generated!)
- Ensure your PR is not a duplicate, search for existing / previous submissions first - Ensure your PR is not a duplicate, search for existing / previous submissions first
- You must respond to any comments or requests for changes in a timely manner, 14 days maximum - Don't forget to give the PR a title. Use the format of `Adds [software-name] to [section-name]`
- Write short but descriptive git commit messages, under 50 characters. This must be in the format of `Adds [software-name] to [section-name]`. Your PR will be rejected if you name it `Updates README.md`
- Only include a single addition / amendment / removal, per pull request - Only include a single addition / amendment / removal, per pull request
- You must complete each of the sections in the [pull request template](https://github.com/Lissy93/awesome-privacy/blob/main/.github/PULL_REQUEST_TEMPLATE.md). Do not delete it! - You must complete each of the sections in the [pull request template](https://github.com/Lissy93/awesome-privacy/blob/main/.github/PULL_REQUEST_TEMPLATE.md). Do not delete it!
- Where applicable, include links to supporting material for your addition: git repo, docs, recent security audits, etc. This will make researching it much easier for reviewers - Where applicable, include links to supporting material for your addition: git repo, docs, recent security audits, etc. This will make researching it much easier for reviewers
- While adding new software to the list, don't make your entry read like an advert. Be objective, and include drawbacks as well as strengths - Your entry should be added at the bottom of the appropriate category
- Your entry should be added at the bottom of the appropriate category, unless otherwise requested - Your changes must be correctly formatted, in valid yaml and which conforms to the schema
- You must be transparent about your affiliation with a product or service that you are adding. It's totally okay to submit your own projects as additions (providing they meet the requirements), but if you don't declare your association with that project then there becomes a clear conflict of interest - Description needs to be 50-250 characters, and must not read like an advert. Be objective, and include drawbacks as well as strengths
- You must be transparent about your affiliation with a product or service that you are adding. It's totally okay to submit your own projects, but if you don't declare your association with that project then there becomes a clear conflict of interest
- You must respond to any comments or requests for changes in a timely manner, 14 days maximum
- You must adhere to the [Contributor Covenant Code of Conduct](https://github.com/Lissy93/awesome-privacy?tab=coc-ov-file#contributor-covenant-code-of-conduct) - You must adhere to the [Contributor Covenant Code of Conduct](https://github.com/Lissy93/awesome-privacy?tab=coc-ov-file#contributor-covenant-code-of-conduct)
- Don't open a Draft / WIP pull request while you work on the guidelines. A pull request should be 100% ready and should adhere to all the above guidelines when you open it
- Your changes must be correctly spelled, and with good grammar
- Your changes must be correctly formatted, in valid yaml and markdown
- The addition description must be no less than 50, and no more than 250 characters, keep it clear and to the point
- If there are other pull requests open, please help review them before submitting yours
- A pull request must receive multiple approval reviews before it can be merged
--- ---
@ -242,6 +239,57 @@ Just look at some of the existing entries in the file for inspiration, and if yo
--- ---
## About the Automated Pre-Review
When you open a PR, we run a few automated checks. This was implemented so that you get helpful feedback immediately, if the submission contains a common mistake.
Note that the pass/fail of these checks does not indicate whether a PR will or will not be merged. And if something does fail, my friendly bot will drop a comment explaining how you can fix it :)
<details>
<summary>View all checks</summary>
Below is the full list of checks - it's basically the same as what is listed in the [Contributing Guidelines](https://github.com/Lissy93/awesome-privacy/blob/main/.github/CONTRIBUTING.md#guidelines) above. Everything in red needs to pass to be merged, whereas yellow is just warnings/suggestions.
- **PR Meta**
- 🔴 **Title format** - Must follow `[Add/Remove/Update] [name] in [section]`
- 🔴 **Template filled** - All required sections (Type, Changes, Checklist) must be present
- 🔴 **Checkboxes ticked** - All checklist boxes must be checked with `[x]`
- 🔴 **No README edits** - README is auto-generated, so direct edits are rejected
- 🟡 **Not a draft** - WIP/draft PRs are discouraged
- 🟡 **No bot authors** - Commits should not be solely authored by an AI bot
- **Validating Addition**
- 🔴 **Schema valid** - YAML must pass schema validation
- 🔴 **Required fields** - Must include `name`, `description`, `url`, `icon`
- 🟡 **Single entry** - Only one service addition per PR
- 🟡 **Position** - New entries must go at the end of their section
- 🟡 **Open source** - Non-open-source submissions need justification
- 🟡 **Duplicate name** - Service name must not already exist
- 🟡 **Duplicate URL** - Service URL must not already exist
- 🟡 **Description length** - Should be 50250 characters
- 🟡 **Open source + GitHub** - If marked open source, must include `github` field
- **Project Health**
- 🟡 **Links reachable** - Service URL and icon must not return 404
- 🟡 **Author disclosure** - If PR author owns the repo, they should disclose it
- 🟡 **Not inactive** - Repo should have a push within the last 90 days
- 🟡 **Minimum age** - Repo should be ≥4 months old
- 🟡 **AI-generated code** - Flags if ≥20% of recent code came from an AI bot
- 🟡 **Not a fork** - Flags if the GitHub link is a fork instead of source
- 🟡 **Has license** - Repo should include a license
- 🟡 **Not archived** - Repo must not be archived
- 🟡 **No security alerts** - No open critical/high Dependabot alerts
- 🟡 **Minimum stars** - Repo should have ≥100 stars
- 🟡 **Spam detection** - Flags if user opened ≥5 PRs to other awesome-* repos in 24h
- **Addition Info** (fyi only, no pass/fail requirements or warnings)
- 🔵 **Website check** (if has `website`) - Quickly checks for basic security requirements for website
- 🔵 **Source check** (if has `github`) - Brief audit of core GitHub metrics from submitted the repo
- 🔵 **Android check** (if has `android`) - Lists the trackers, permissions and stats for the Android app
- 🔵 **iOS check** (if has `ios`) - Shows average rating, and app stats from the Apple App Store
- 🔵 **Privacy Policy check** (if has `tosdr`) - Outputs the privacy score from ToS;DR and links to policy
</details>
---
## Thank You ## Thank You
Thank you for helping keep Awesome Privacy up-to-date! It's thanks to contributors like you that this project is possible. Thank you for helping keep Awesome Privacy up-to-date! It's thanks to contributors like you that this project is possible.

5687
.github/README.md vendored

File diff suppressed because it is too large Load diff

View file

@ -11,6 +11,7 @@ on:
permissions: permissions:
contents: read contents: read
pull-requests: read pull-requests: read
security-events: read
jobs: jobs:
pr-compliance: pr-compliance:
@ -75,6 +76,8 @@ jobs:
run: python lib/checks/check-yaml-diff.py --base-ref ${{ github.event.pull_request.base.sha }} run: python lib/checks/check-yaml-diff.py --base-ref ${{ github.event.pull_request.base.sha }}
- name: Check additions - name: Check additions
if: steps.changes.outputs.yaml_changed == 'true' if: steps.changes.outputs.yaml_changed == 'true'
id: additions
continue-on-error: true
env: env:
SCHEMA_OUTCOME: ${{ steps.schema.outcome }} SCHEMA_OUTCOME: ${{ steps.schema.outcome }}
run: python lib/checks/check-additions.py run: python lib/checks/check-additions.py
@ -95,7 +98,7 @@ jobs:
path: /tmp/findings-data.json path: /tmp/findings-data.json
if-no-files-found: ignore if-no-files-found: ignore
- name: Fail if critical - name: Fail if critical
if: steps.changes.outputs.yaml_changed == 'true' && (steps.schema.outcome == 'failure' || steps.diff.outcome == 'failure') if: steps.changes.outputs.yaml_changed == 'true' && (steps.schema.outcome == 'failure' || steps.diff.outcome == 'failure' || steps.additions.outcome == 'failure')
run: exit 1 run: exit 1
submission-eligibility: submission-eligibility:
@ -121,6 +124,18 @@ jobs:
PR_BODY: ${{ github.event.pull_request.body }} PR_BODY: ${{ github.event.pull_request.body }}
GITHUB_TOKEN: ${{ github.token }} GITHUB_TOKEN: ${{ github.token }}
run: python lib/checks/check-project.py run: python lib/checks/check-project.py
- name: Generate repo stats
continue-on-error: true
env:
GITHUB_TOKEN: ${{ github.token }}
run: python lib/checks/make-info-stats.py
- name: Upload repo stats
if: always()
uses: actions/upload-artifact@v4
with:
name: repo-stats
path: /tmp/repo-stats.md
if-no-files-found: ignore
- name: Upload findings - name: Upload findings
if: always() if: always()
uses: actions/upload-artifact@v4 uses: actions/upload-artifact@v4
@ -152,6 +167,12 @@ jobs:
name: pr-diff name: pr-diff
path: /tmp/artifacts path: /tmp/artifacts
continue-on-error: true continue-on-error: true
- name: Download repo stats
uses: actions/download-artifact@v4
with:
name: repo-stats
path: /tmp/artifacts
continue-on-error: true
- name: Format comment - name: Format comment
env: env:
PR_USER: ${{ github.event.pull_request.user.login }} PR_USER: ${{ github.event.pull_request.user.login }}

View file

@ -169,15 +169,16 @@ def makeHref(text):
return re.sub(r'[^\w\s-]', '', text.lower()).replace(" ", "-") return re.sub(r'[^\w\s-]', '', text.lower()).replace(" ", "-")
def makeContents(): def makeContents():
contents = "<blockquote><details>\n" contents = "<blockquote><details open>\n"
contents += "<summary>📋 <b>Contents</b></summary>\n" contents += "<summary>📋 <b>Contents</b></summary>\n"
for category in data.get('categories'): for category in data.get('categories'):
contents += f"\n- **{category.get('name')}**" contents += f"\n- **{category.get('name')}**"
for section in category.get('sections'): for section in category.get('sections'):
contents += ( if (len(section.get('services') or []) > 0):
f"\n\t- [{section.get('name')}](#{makeHref(section.get('name'))}) " contents += (
f"({len(section.get('services') or [])})" f"\n\t- [{section.get('name')}](#{makeHref(section.get('name'))}) "
f"({len(section.get('services') or [])})"
) )
contents += "\n</details></blockquote>\n\n" contents += "\n</details></blockquote>\n\n"
return contents return contents
@ -201,7 +202,7 @@ def makeAwesomePrivacy():
) )
# For each service, list it's name, icon, url, and description # For each service, list it's name, icon, url, and description
for app in section.get('services') or []: for app in section.get('services') or []:
description, was_truncated = truncateMarkdown(app.get('description', '')) description, was_truncated = truncateMarkdown(' '.join(app.get('description', '').split()))
ap_link = ( ap_link = (
f"https://awesome-privacy.xyz/" f"https://awesome-privacy.xyz/"
f"{slugify(category.get('name'))}/{slugify(section.get('name'))}/{slugify(app.get('name'))}" f"{slugify(category.get('name'))}/{slugify(section.get('name'))}/{slugify(app.get('name'))}"

View file

@ -237,9 +237,10 @@ def check_opensource_github(diff):
def main(): def main():
findings = [] findings = []
critical = False
try: try:
if os.environ.get("SCHEMA_OUTCOME") == "failure": if os.environ.get("SCHEMA_OUTCOME") == "failure":
findings.append(SCHEMA_MSG) findings.append({"msg": SCHEMA_MSG, "level": "error"})
diff = load_json(DIFF_PATH) diff = load_json(DIFF_PATH)
head = load_yaml_data(DATA_PATH) head = load_yaml_data(DATA_PATH)
@ -251,7 +252,8 @@ def main():
finding = check_required_fields(diff, head) finding = check_required_fields(diff, head)
if finding: if finding:
findings.append(finding) findings.append({"msg": finding, "level": "error"})
critical = True
finding = check_position(diff, head) finding = check_position(diff, head)
if finding: if finding:
@ -284,7 +286,7 @@ def main():
with open(FINDINGS_PATH, "w") as f: with open(FINDINGS_PATH, "w") as f:
json.dump(findings, f) json.dump(findings, f)
sys.exit(0) sys.exit(1 if critical else 0)
if __name__ == "__main__": if __name__ == "__main__":

View file

@ -140,27 +140,29 @@ def main():
finding = check_title(title) finding = check_title(title)
if finding: if finding:
findings.append(finding) findings.append({"msg": finding, "level": "error"})
critical = True
finding = check_draft(draft) finding = check_draft(draft)
if finding: if finding:
findings.append(finding) findings.append(finding)
if not body or not body.strip(): if not body or not body.strip():
findings.append(TEMPLATE_MSG) findings.append({"msg": TEMPLATE_MSG, "level": "error"})
critical = True critical = True
else: else:
finding = check_template(body) finding = check_template(body)
if finding: if finding:
findings.append(finding) findings.append({"msg": finding, "level": "error"})
critical = True critical = True
finding = check_checkboxes(body) finding = check_checkboxes(body)
if finding: if finding:
findings.append(finding) findings.append({"msg": finding, "level": "error"})
critical = True
finding = check_readme(readme_failed) finding = check_readme(readme_failed)
if finding: if finding:
findings.append(finding) findings.append({"msg": finding, "level": "error"})
except Exception: except Exception:
pass pass

View file

@ -29,8 +29,8 @@ AI_BOT_AUTHORS = [
SPAM_PR_THRESHOLD = 5 SPAM_PR_THRESHOLD = 5
LINK_MSG = ( LINK_MSG = (
"Our automated checks were unable to verify the link(s) you included" "The link(s) you included seem to be returning a 404."
" were reachable, so please double check this yourself" " Please double check all URLs listed are valid and publicly accessible"
) )
AUTHOR_MSG = ( AUTHOR_MSG = (
"Looks like you are the author of this package. Please ensure that you" "Looks like you are the author of this package. Please ensure that you"

View file

@ -3,6 +3,7 @@
import json import json
import os import os
import sys import sys
from datetime import datetime, timezone
ARTIFACTS_DIR = "/tmp/artifacts" ARTIFACTS_DIR = "/tmp/artifacts"
OUTPUT_DIR = "/tmp/pr-meta" OUTPUT_DIR = "/tmp/pr-meta"
@ -10,6 +11,17 @@ OUTPUT_DIR = "/tmp/pr-meta"
REPO_URL = "https://github.com/Lissy93/awesome-privacy" REPO_URL = "https://github.com/Lissy93/awesome-privacy"
CONTRIBUTING = f"{REPO_URL}/blob/main/.github/CONTRIBUTING.md" CONTRIBUTING = f"{REPO_URL}/blob/main/.github/CONTRIBUTING.md"
DIFF_SUMMARY_PATH = os.path.join(ARTIFACTS_DIR, "pr-diff-summary.md") DIFF_SUMMARY_PATH = os.path.join(ARTIFACTS_DIR, "pr-diff-summary.md")
REPO_STATS_PATH = os.path.join(ARTIFACTS_DIR, "repo-stats.md")
def load_repo_stats():
"""Load the repo stats markdown, or None if unavailable."""
try:
with open(REPO_STATS_PATH) as f:
content = f.read().strip()
return content if content else None
except Exception:
return None
def load_findings(filename): def load_findings(filename):
@ -22,13 +34,28 @@ def load_findings(filename):
return [] return []
def normalize_finding(f):
"""Return {"msg": str, "level": str} from a dict or plain string."""
if isinstance(f, dict):
return {"msg": str(f.get("msg", "")), "level": f.get("level", "warning")}
return {"msg": str(f), "level": "warning"}
def collect_findings(): def collect_findings():
"""Gather all findings in display order: compliance, data, project.""" """Gather all findings, split into (errors, warnings) lists of message strings."""
all_findings = [] raw = []
all_findings.extend(load_findings("findings-compliance.json")) raw.extend(load_findings("findings-compliance.json"))
all_findings.extend(load_findings("findings-data.json")) raw.extend(load_findings("findings-data.json"))
all_findings.extend(load_findings("findings-project.json")) raw.extend(load_findings("findings-project.json"))
return all_findings errors = []
warnings = []
for f in raw:
normalized = normalize_finding(f)
if normalized["level"] == "error":
errors.append(normalized["msg"])
else:
warnings.append(normalized["msg"])
return errors, warnings
def load_diff_summary(): def load_diff_summary():
@ -41,7 +68,18 @@ def load_diff_summary():
return None return None
def format_comment(findings, user, changes_summary, run_id): def _extract_changes_bullets(diff_summary):
"""Re-format bullet lines from the diff summary with a blue circle prefix."""
if not diff_summary:
return None
bullets = []
for line in diff_summary.splitlines():
if line.startswith("- "):
bullets.append(f"- \U0001f535 {line[2:]}")
return "\n".join(bullets) if bullets else None
def format_comment(findings, user, changes_summary, run_id, repo_stats=None):
"""Build the markdown comment.""" """Build the markdown comment."""
parts = [ parts = [
f"<!-- pr-check-bot -->\nHello @{user}\n", f"<!-- pr-check-bot -->\nHello @{user}\n",
@ -59,7 +97,7 @@ def format_comment(findings, user, changes_summary, run_id):
f"But a human will review your submission shortly!" f"But a human will review your submission shortly!"
) )
else: else:
parts.append("> All our automated checks have passed.") parts.append("> \u2705 All our automated checks have passed.")
if changes_summary: if changes_summary:
parts.append( parts.append(
@ -67,6 +105,12 @@ def format_comment(findings, user, changes_summary, run_id):
f"{changes_summary}\n</details>" f"{changes_summary}\n</details>"
) )
if repo_stats:
parts.append(
f"<details><summary>Submission Info</summary>\n\n"
f"{repo_stats}\n</details>"
)
if run_id: if run_id:
parts.append( parts.append(
f'<sup>For full details, please see workflow run ' f'<sup>For full details, please see workflow run '
@ -76,18 +120,80 @@ def format_comment(findings, user, changes_summary, run_id):
return "\n\n".join(parts) + "\n" return "\n\n".join(parts) + "\n"
def write_step_summary(findings): def write_step_summary(errors, warnings, user, pr_number, run_id, changes_summary,
"""Write a summary to GITHUB_STEP_SUMMARY.""" repo_stats=None):
"""Write a structured summary to GITHUB_STEP_SUMMARY."""
summary_file = os.environ.get("GITHUB_STEP_SUMMARY") summary_file = os.environ.get("GITHUB_STEP_SUMMARY")
if not summary_file: if not summary_file:
return return
lines = ["## PR Check Summary\n"]
if findings: lines = ["## Status Check Results\n"]
lines.append(f"⚠️ Found {len(findings)} issue(s):\n")
for f in findings: # Summary sentence
lines.append(f"- {f}") ne, nw = len(errors), len(warnings)
lines.append("### Summary\n")
if ne and nw:
lines.append(
f"There are {ne} error(s) which must be resolved before this PR can be"
f" reviewed, as well as {nw} warning(s) which need to be addressed or"
f" justified.\n"
)
elif ne:
lines.append(
f"There are {ne} error(s) which must be resolved before this PR can be"
f" reviewed.\n"
)
elif nw:
lines.append(
f"There were no errors but {nw} warning(s) which need to be addressed"
f" or justified before the PR can be merged.\n"
)
else: else:
lines.append("✅ All checks passed.\n") lines.append(
"All checks are passing, with no errors and no warnings \U0001f389\n"
"A maintainer has been notified, and will review the submission shortly.\n"
)
# Errors
lines.append("### Errors\n")
if errors:
for e in errors:
lines.append(f"- \U0001f534 {e}")
else:
lines.append("\u2705 None")
lines.append("")
# Warnings
lines.append("### Warnings\n")
if warnings:
for w in warnings:
lines.append(f"- \U0001f7e1 {w}")
else:
lines.append("\u2705 None")
lines.append("")
# Meta Info
lines.append("### Meta Info\n")
now = datetime.now(timezone.utc)
timestamp = now.strftime("%H:%M UTC on %d %b %Y")
if pr_number:
lines.append(
f"This workflow run was triggered at {timestamp}"
f" for PR #{pr_number} which was opened by @{user}\n"
)
else:
lines.append(
f"This workflow run was triggered at {timestamp} by @{user}\n"
)
if changes_summary:
lines.append("The PR introduces the following changes:\n")
lines.append(f"{changes_summary}\n")
if repo_stats:
lines.append("#### Submission Info\n")
lines.append(f"{repo_stats}\n")
with open(summary_file, "a") as f: with open(summary_file, "a") as f:
f.write("\n".join(lines) + "\n") f.write("\n".join(lines) + "\n")
@ -107,13 +213,17 @@ def main():
with open(os.path.join(OUTPUT_DIR, "run-id.txt"), "w") as f: with open(os.path.join(OUTPUT_DIR, "run-id.txt"), "w") as f:
f.write(run_id) f.write(run_id)
findings = collect_findings() errors, warnings = collect_findings()
all_findings = errors + warnings
with open(os.path.join(OUTPUT_DIR, "findings-count.txt"), "w") as f: with open(os.path.join(OUTPUT_DIR, "findings-count.txt"), "w") as f:
f.write(str(len(findings))) f.write(str(len(all_findings)))
changes_summary = load_diff_summary() changes_summary = load_diff_summary()
write_step_summary(findings) changes_bullets = _extract_changes_bullets(changes_summary)
repo_stats = load_repo_stats()
write_step_summary(errors, warnings, user, pr_number, run_id, changes_bullets,
repo_stats)
comment = format_comment(findings, user, changes_summary, run_id) comment = format_comment(all_findings, user, changes_summary, run_id, repo_stats)
with open(os.path.join(OUTPUT_DIR, "comment.md"), "w") as f: with open(os.path.join(OUTPUT_DIR, "comment.md"), "w") as f:
f.write(comment) f.write(comment)
except Exception: except Exception:

View file

@ -0,0 +1,639 @@
"""
This fetches info about a project/service which is being submitted.
It's used when a PR is open, to show some additional context.
Everything fetched here, is basically just a sneak peek of
what will be fetched by the main awesome-privacy.xyz website
once this submission is deployed. And it uses all the same endpoints.
It covers (where applicable) the following look ups:
- Repo - basic community checks
- Website - security sanity checks
- Android app - permissions, trackers, meta
- iOS app - reviews, and meta info
- Privacy policy - overall grade, link (if tosdr)
The output is in markdown, and has some color grading with circle emojis.
This is not a pass/fail check, and is not required for a PR to get merged.
It just adds a bit of context, to make reviewing it a tiny bit quicker!
Excuse the code, it's a bit scrappy! But it's never used in the prod app.
"""
import argparse
import json
import os
import sys
from datetime import datetime, timezone
from urllib.parse import urlparse
import requests
DIFF_PATH = "/tmp/pr-diff.json"
OUTPUT_PATH = "/tmp/repo-stats.md"
TIMEOUT = 10
USER_AGENT = "awesome-privacy-ci/1.0"
AI_BOT_AUTHORS = [
"noreply@anthropic.com",
"devin-ai-integration[bot]",
"copilot-swe-agent.github.com",
"noreply@cursor.com",
]
RESTRICTIVE_LICENSES = {
"AGPL-3.0-only", "AGPL-3.0-or-later", "SSPL-1.0", "BSL-1.0", "BUSL-1.1",
}
SITE_INFO_URL = "https://site-info-fetch.as93.workers.dev"
ANDROID_API_URL = "https://android-app-privacy.as93.net"
IOS_API_URL = "https://ios-app-info.as93.net"
TOSDR_API_URL = "https://privacy-policies.as93.workers.dev"
GREEN, ORANGE, RED, BLUE, WHITE = "\U0001f7e2", "\U0001f7e0", "\U0001f534", "\U0001f535", "\u26aa"
def _api_get(url, params=None, timeout=TIMEOUT, headers=None):
"""GET a URL, return parsed JSON on 200, else None."""
hdrs = {"User-Agent": USER_AGENT}
if headers:
hdrs.update(headers)
try:
resp = requests.get(url, headers=hdrs, timeout=timeout, params=params)
if resp.status_code == 200:
return resp.json()
except Exception as e:
print(f"Fetch failed for {url}: {e}", file=sys.stderr)
return None
def relative_time(iso_str):
"""Convert ISO timestamp to human-readable relative time, or None."""
if not iso_str:
return None
try:
dt = datetime.fromisoformat(str(iso_str).replace("Z", "+00:00"))
days = (datetime.now(timezone.utc) - dt).days
if days < 1:
return "today"
if days < 7:
return f"{days} day{'s' if days != 1 else ''}"
if days < 30:
w = days // 7
return f"{w} week{'s' if w != 1 else ''}"
if days < 365:
m = days // 30
return f"{m} month{'s' if m != 1 else ''}"
y, rm = days // 365, (days % 365) // 30
s = f"{y} year{'s' if y != 1 else ''}"
return f"{s}, {rm} month{'s' if rm != 1 else ''}" if rm else s
except Exception:
return None
def _days_since(iso_str):
"""Return number of days since an ISO timestamp, or None."""
if not iso_str:
return None
try:
dt = datetime.fromisoformat(iso_str.replace("Z", "+00:00"))
return (datetime.now(timezone.utc) - dt).days
except Exception:
return None
def _friendly_date(iso_str):
"""Return relative time string with 'ago' suffix, falling back to raw string."""
if not iso_str:
return None
rt = relative_time(iso_str)
if rt is None:
return str(iso_str)
return rt if rt == "today" else f"{rt} ago"
def _format_bytes(n):
"""Format bytes to human-readable size."""
try:
n = int(n)
except (TypeError, ValueError):
return None
for unit, threshold in [("GB", 1e9), ("MB", 1e6), ("KB", 1e3)]:
if n >= threshold:
return f"{n / threshold:.1f} {unit}"
return f"{n} B"
def _info_or_unknown(label, value):
"""Return a blue info stat, or white Unknown if value is falsy."""
return (BLUE, label, value) if value else (WHITE, label, "Unknown")
def format_markdown(stats):
"""Format graded stats as markdown bullet list."""
return "\n".join(f"- {emoji} **{label}:** {value}" for emoji, label, value in stats)
def parse_github_field(value):
"""Parse 'owner/repo' or full URL into (owner, repo) or (None, None)."""
if not value:
return None, None
if value.startswith("https://github.com/"):
parts = value.removeprefix("https://github.com/").strip("/").split("/")
if len(parts) >= 2:
return parts[0], parts[1]
return None, None
if "/" in value:
parts = value.split("/")
if len(parts) == 2:
return parts[0], parts[1]
return None, None
def gh_get(path, token, params=None):
"""GET a GitHub API endpoint. Returns JSON on 200, else None."""
headers = {"Accept": "application/vnd.github.v3+json"}
if token:
headers["Authorization"] = f"token {token}"
return _api_get(f"https://api.github.com{path}", params=params, headers=headers)
def fetch_all_data(owner, repo, token):
"""Fetch all repo data. Returns dict or None if main repo call fails."""
base = gh_get(f"/repos/{owner}/{repo}", token)
if not base:
return None
data = {
"license": base.get("license"),
"created_at": base.get("created_at"),
"pushed_at": base.get("pushed_at"),
"stars": base.get("stargazers_count", 0),
"fork": base.get("fork", False),
"archived": base.get("archived", False),
"homepage": base.get("homepage"),
"owner": base.get("owner", {}).get("login"),
"open_issues_count": base.get("open_issues_count", 0),
}
releases = gh_get(f"/repos/{owner}/{repo}/releases", token, {"per_page": 11})
data["release_count"] = len(releases) if releases is not None else None
contributors = gh_get(
f"/repos/{owner}/{repo}/contributors", token, {"per_page": 11, "anon": "true"},
)
data["contributor_count"] = len(contributors) if contributors is not None else None
commits = gh_get(f"/repos/{owner}/{repo}/commits", token, {"per_page": 100})
if commits is not None:
bot_set = {a.lower() for a in AI_BOT_AUTHORS}
ai_count = 0
for c in commits:
author = c.get("commit", {}).get("author", {})
email = (author.get("email") or "").lower()
name = (author.get("name") or "").lower()
if email in bot_set or name in bot_set:
ai_count += 1
continue
message = (c.get("commit", {}).get("message") or "").lower()
for line in message.splitlines():
if line.strip().startswith("co-authored-by:"):
if any(bot in line for bot in bot_set):
ai_count += 1
break
data["commit_count"] = len(commits)
data["ai_commit_count"] = ai_count
else:
data["commit_count"] = None
data["ai_commit_count"] = None
alerts = gh_get(
f"/repos/{owner}/{repo}/dependabot/alerts", token,
{"state": "open", "severity": "critical,high", "per_page": 1},
)
data["has_security_alerts"] = bool(alerts) if alerts is not None else None
languages = gh_get(f"/repos/{owner}/{repo}/languages", token)
data["languages"] = list(languages.keys()) if languages is not None else None
return data
def grade_stats(data):
"""Grade repo stats, returning list of (emoji, label, value_str) tuples."""
stats = []
lic = data.get("license")
if not lic:
stats.append((RED, "License", "Missing"))
else:
spdx = lic.get("spdx_id", "")
if spdx == "NOASSERTION":
stats.append((WHITE, "License", "Unknown"))
elif spdx in RESTRICTIVE_LICENSES:
stats.append((ORANGE, "License", spdx))
else:
stats.append((GREEN, "License", lic.get("name") or spdx or "Present"))
age_days = _days_since(data.get("created_at"))
age_str = relative_time(data.get("created_at"))
if age_days is None:
stats.append((WHITE, "Repo Age", "Unknown"))
elif age_days >= 730:
stats.append((GREEN, "Repo Age", age_str))
elif age_days >= 180:
stats.append((ORANGE, "Repo Age", age_str))
else:
stats.append((RED, "Repo Age", age_str))
updated_days = _days_since(data.get("pushed_at"))
updated_str = _friendly_date(data.get("pushed_at"))
if updated_days is None:
stats.append((WHITE, "Last Updated", "Unknown"))
elif updated_days <= 7:
stats.append((GREEN, "Last Updated", updated_str))
elif updated_days <= 56:
stats.append((ORANGE, "Last Updated", updated_str))
else:
stats.append((RED, "Last Updated", updated_str))
rc = data.get("release_count")
if rc is None:
stats.append((WHITE, "Releases", "Unknown"))
elif rc >= 10:
stats.append((GREEN, "Releases", f"{rc}+" if rc >= 11 else str(rc)))
elif rc >= 1:
stats.append((ORANGE, "Releases", str(rc)))
else:
stats.append((RED, "Releases", "0"))
stars = data.get("stars")
if stars is None:
stats.append((WHITE, "Stars", "Unknown"))
elif stars >= 1000:
stats.append((GREEN, "Stars", f"{stars:,}"))
elif stars >= 100:
stats.append((ORANGE, "Stars", f"{stars:,}"))
else:
stats.append((RED, "Stars", f"{stars:,}"))
cc = data.get("contributor_count")
if cc is None:
stats.append((WHITE, "Contributors", "Unknown"))
elif cc >= 10:
stats.append((GREEN, "Contributors", f"{cc}+" if cc >= 11 else str(cc)))
elif cc >= 3:
stats.append((ORANGE, "Contributors", str(cc)))
else:
stats.append((RED, "Contributors", str(cc)))
fork = data.get("fork")
if fork is None:
stats.append((WHITE, "Is Fork", "Unknown"))
else:
stats.append((ORANGE if fork else GREEN, "Is Fork", "Yes" if fork else "No"))
archived = data.get("archived")
if archived is None:
stats.append((WHITE, "Is Archived", "Unknown"))
else:
stats.append((RED if archived else GREEN, "Is Archived", "Yes" if archived else "No"))
alerts = data.get("has_security_alerts")
if alerts is None:
stats.append((WHITE, "Security Alerts", "Unknown"))
elif alerts:
stats.append((RED, "Security Alerts", "Open critical/high alerts"))
else:
stats.append((GREEN, "Security Alerts", "None"))
ai = data.get("ai_commit_count")
if ai is None:
stats.append((WHITE, "Vibe Coded", "Unknown"))
elif ai >= 20:
stats.append((RED, "Vibe Coded", f"{ai} AI commits"))
elif ai >= 1:
stats.append((ORANGE, "Vibe Coded", f"{ai} AI commit{'s' if ai != 1 else ''}"))
else:
stats.append((GREEN, "Vibe Coded", "0 AI commits"))
commit_count = data.get("commit_count")
if commit_count is None:
stats.append((WHITE, "Commits", "Unknown"))
else:
stats.append((BLUE, "Commits", f"{commit_count:,}+" if commit_count >= 100 else f"{commit_count:,}"))
issues = data.get("open_issues_count")
stats.append((BLUE, "Open Issues", f"{issues:,}") if issues is not None else (WHITE, "Open Issues", "Unknown"))
stats.append(_info_or_unknown("Website", data.get("homepage")))
stats.append(_info_or_unknown("Author", data.get("owner")))
langs = data.get("languages")
stats.append(_info_or_unknown("Languages", ", ".join(langs) if langs else None))
return stats
def fetch_website_data(url):
"""Fetch site info from the worker API."""
return _api_get(SITE_INFO_URL, params={"url": url}, timeout=15)
def check_security_txt(url):
"""Check for a valid security.txt. Returns True/False/None on error."""
parsed = urlparse(url)
base = f"{parsed.scheme}://{parsed.netloc}"
for path in ("/.well-known/security.txt", "/security.txt"):
try:
resp = requests.get(
base + path, headers={"User-Agent": USER_AGENT},
timeout=TIMEOUT, allow_redirects=True,
)
if resp.status_code == 200 and "contact:" in resp.text.lower():
return True
except Exception:
continue
try:
requests.head(base, headers={"User-Agent": USER_AGENT}, timeout=TIMEOUT)
return False
except Exception:
return None
def _header_present(data, key):
"""Check if a response header is present. Returns GREEN/RED/WHITE tuple helper."""
if not data:
return WHITE
val = data.get("response_headers", {}).get(key)
if key == "content_security_policy" and not val:
val = data.get("response_headers", {}).get("content_security_policy_report_only")
return GREEN if val else RED
def grade_website_stats(data, url, has_security_txt):
"""Grade website stats."""
stats = []
code = data.get("response_headers", {}).get("code") if data else None
if code is None:
stats.append((WHITE, "Status", "Unknown"))
elif 200 <= code < 300:
stats.append((GREEN, "Status", str(code)))
elif 300 <= code < 400:
stats.append((ORANGE, "Status", str(code)))
else:
stats.append((RED, "Status", str(code)))
stats.append((GREEN, "HTTPS", "Yes") if url.startswith("https://") else (RED, "HTTPS", "No"))
bl = data.get("domain_blacklist", {}) if data else {}
detections = bl.get("detections") if isinstance(bl, dict) else None
if detections is None:
stats.append((WHITE, "Blacklist", "Unknown"))
elif detections == 0:
stats.append((GREEN, "Blacklist", "Not listed"))
else:
stats.append((RED, "Blacklist", f"{detections} detection(s)"))
redir = data.get("redirection", {}) if data else {}
if not isinstance(redir, dict):
redir = {}
found, external = redir.get("found"), redir.get("external")
if found is None:
stats.append((WHITE, "Redirect", "Unknown"))
elif not found:
stats.append((GREEN, "Redirect", "None"))
elif external:
stats.append((RED, "Redirect", "External redirect"))
else:
stats.append((ORANGE, "Redirect", "Internal redirect"))
risk_data = data.get("risk_score", {}) if data else {}
risk = risk_data.get("result") if isinstance(risk_data, dict) else None
if risk is None:
stats.append((WHITE, "Risk Score", "Unknown"))
elif risk == 0:
stats.append((GREEN, "Risk Score", "0"))
elif risk <= 5:
stats.append((ORANGE, "Risk Score", str(risk)))
else:
stats.append((RED, "Risk Score", str(risk)))
for key, label in [("strict_transport_security", "HSTS"),
("content_security_policy", "CSP"),
("x_frame_options", "X-Frame-Options")]:
emoji = _header_present(data, key)
stats.append((emoji, label, "Present" if emoji == GREEN else "Missing" if emoji == RED else "Unknown"))
if has_security_txt is None:
stats.append((WHITE, "Security.txt", "Unknown"))
else:
stats.append((GREEN if has_security_txt else RED, "Security.txt",
"Present" if has_security_txt else "Missing"))
sd = data.get("server_details", {}) if data else {}
if not isinstance(sd, dict):
sd = {}
server_parts = [v for k in ("ip", "country", "asn") if (v := sd.get(k))]
stats.append(_info_or_unknown("Server", ", ".join(server_parts) if server_parts else None))
loc_parts = [v for k in ("city_name", "region_name", "country_name") if (v := sd.get(k))]
stats.append(_info_or_unknown("Server Location", ", ".join(loc_parts) if loc_parts else None))
title = None
if data and isinstance(data.get("web_page"), dict):
title = data["web_page"].get("title")
stats.append(_info_or_unknown("Title", title))
return stats
def fetch_android_data(package_id):
"""Fetch Android app privacy data."""
package_id = package_id.split("id=")[-1] if "id=" in package_id else package_id
data = _api_get(f"{ANDROID_API_URL}/{package_id}")
return data if data and not data.get("error") else None
def grade_android_stats(data):
"""Grade Android app stats."""
stats = []
trackers = data.get("trackers")
if trackers is None:
stats.append((WHITE, "Trackers", "Unknown"))
else:
n = len(trackers)
stats.append((GREEN if n == 0 else ORANGE if n <= 2 else RED, "Trackers", str(n)))
perms = data.get("permissions")
if perms is None:
stats.append((WHITE, "Permissions", "Unknown"))
else:
n = len(perms)
stats.append((GREEN if n <= 2 else ORANGE if n <= 10 else RED, "Permissions", str(n)))
stats.append(_info_or_unknown("Downloads", data.get("downloads")))
stats.append(_info_or_unknown("Created", _friendly_date(data.get("created"))))
stats.append(_info_or_unknown("Last Updated", _friendly_date(data.get("updated"))))
return stats
def fetch_ios_data(app_url):
"""Fetch iOS app info."""
return _api_get(IOS_API_URL, params={"appStoreUrl": app_url})
def grade_ios_stats(data):
"""Grade iOS app stats."""
stats = []
rating = data.get("averageUserRating")
if rating is None:
stats.append((WHITE, "Rating", "Unknown"))
else:
stats.append((GREEN if rating >= 4.5 else ORANGE if rating >= 3.5 else RED,
"Rating", f"{rating:.1f} / 5"))
stats.append(_info_or_unknown("Created", _friendly_date(data.get("releaseDate"))))
stats.append(_info_or_unknown("Last Updated", _friendly_date(data.get("currentVersionReleaseDate"))))
stats.append(_info_or_unknown("Size", _format_bytes(data.get("fileSizeBytes"))))
return stats
def fetch_tosdr_data(service_id):
"""Fetch ToS;DR privacy policy data."""
return _api_get(f"{TOSDR_API_URL}/{service_id}")
def grade_tosdr_stats(data):
"""Grade ToS;DR privacy policy stats."""
stats = []
params = data.get("parameters") or {}
rating = params.get("rating")
if not rating:
stats.append((WHITE, "Score", "Unknown"))
else:
r = str(rating).upper()
stats.append((GREEN if r == "A" else ORANGE if r in ("B", "C") else RED,
"Score", f"Grade {r}"))
doc_url = None
docs = params.get("documents")
if docs and isinstance(docs, list) and isinstance(docs[0], dict):
doc_url = docs[0].get("url")
stats.append(_info_or_unknown("Privacy Policy", doc_url))
return stats
def _resolve_args(argv):
"""Return dict with keys: owner, repo, url, android, ios, tosdr. All optional."""
parser = argparse.ArgumentParser(description="Generate submission info stats")
parser.add_argument("--repo", default=None, help="owner/repo")
parser.add_argument("--url", default=None, help="Website URL to check")
parser.add_argument("--android", default=None, help="Android package ID")
parser.add_argument("--ios", default=None, help="iOS App Store URL")
parser.add_argument("--tosdr", default=None, help="ToS;DR service ID")
args = parser.parse_args(argv[1:])
result = {"owner": None, "repo": None, "url": args.url,
"android": args.android, "ios": args.ios, "tosdr": args.tosdr}
if any(vars(args).values()):
if args.repo:
owner, repo = parse_github_field(args.repo)
if not owner:
print(f"Invalid repo format: {args.repo}", file=sys.stderr)
sys.exit(1)
result["owner"], result["repo"] = owner, repo
return result
# CI mode: extract from diff file
try:
with open(DIFF_PATH) as f:
diff = json.load(f)
except Exception:
print("No arguments and no diff file found", file=sys.stderr)
sys.exit(0)
field_map = {"github": "owner", "url": "url", "androidApp": "android",
"iosApp": "ios", "tosdrId": "tosdr"}
for svc in diff.get("services", {}).get("added", []):
fields = svc.get("fields", {})
for yaml_key, result_key in field_map.items():
if not result.get(result_key) and fields.get(yaml_key):
if yaml_key == "github":
result["owner"], result["repo"] = parse_github_field(fields[yaml_key])
else:
result[result_key] = str(fields[yaml_key])
break # only first added service
if not any(result.values()):
print("No checkable fields found in diff", file=sys.stderr)
sys.exit(0)
return result
def main():
try:
args = _resolve_args(sys.argv)
cli_mode = len(sys.argv) > 1
sections = []
if args["owner"] and args["repo"]:
token = os.environ.get("GITHUB_TOKEN", "")
data = fetch_all_data(args["owner"], args["repo"], token)
if data:
sections.append(("Repo Stats", format_markdown(grade_stats(data))))
else:
print(f"Failed to fetch repo data for {args['owner']}/{args['repo']}", file=sys.stderr)
if args["url"]:
site_data = fetch_website_data(args["url"])
has_sec_txt = check_security_txt(args["url"])
if site_data or has_sec_txt is not None:
sections.append(("Website Checks",
format_markdown(grade_website_stats(site_data, args["url"], has_sec_txt))))
if args["android"]:
data = fetch_android_data(args["android"])
if data:
sections.append(("Android App", format_markdown(grade_android_stats(data))))
if args["ios"]:
data = fetch_ios_data(args["ios"])
if data:
sections.append(("iOS App", format_markdown(grade_ios_stats(data))))
if args["tosdr"]:
data = fetch_tosdr_data(args["tosdr"])
if data:
sections.append(("Privacy Policy", format_markdown(grade_tosdr_stats(data))))
if not sections:
sys.exit(0)
md_parts = []
for heading, body in sections:
md_parts.append(f"#### {heading}\n{body}")
md = "\n\n".join(md_parts)
md += "\n\n<sup>The above data does not determine a submissions eligibility. Human review is still needed.</sup>\n"
md += "<sup><b>Key:</b> 🟢 = good. 🟠 = warning. 🔴 = attention required. 🔵 = info. ⚪ = unknown. </sup>\n\n"
if cli_mode:
print(md)
else:
with open(OUTPUT_PATH, "w") as f:
f.write(md + "\n")
print(f"Stats written to {OUTPUT_PATH}")
except Exception as e:
print(f"make-info-stats failed: {e}", file=sys.stderr)
sys.exit(0)
if __name__ == "__main__":
main()