Updates to PR review process

2026-03-11 08:55:33 +00:00 · 2026-03-08 00:01:29 +00:00 · 2026-03-08 00:01:29 +00:00 · d31dcfe05f
commit d31dcfe05f
parent 01be5b01f9
9 changed files with 908 additions and 5700 deletions
--- a/.github/CONTRIBUTING.md
+++ b/.github/CONTRIBUTING.md
@ -31,6 +31,7 @@ Your request will be reviewed, then either merged, or have changes requested, or
 - To make layout or stylistic edits to the site ([awesome-privacy.xyz](https://awesome-privacy.xyz)), see the [Website docs](https://github.com/Lissy93/awesome-privacy#the-website) in the readme for build and running instructions.
 - To make edits to the API ([api.awesome-privacy.xyz](http://api.awesome-privacy.xyz/)), see the [API docs](https://github.com/Lissy93/awesome-privacy#the-api) in the readme for build and running instructions.
 - To make changes to the automations (which validate, process and insert the data), see the [lib](https://github.com/Lissy93/awesome-privacy/blob/main/lib) directory
 ---
@ -41,7 +42,7 @@ For software to be included in this list, it must meet the following requirement
 - **Privacy Respecting**
 	- The project must respect users privacy, not collect more data than necessary, and store info securely
 	- For hosted services, the project must have a clear privacy policy
-	- The user must remain in full control of their data, and be able to delete it at any time
+	- The user must remain in full control of their data, and be able to export and delete it at any time
 - **Secure**
  - The software must be secure by default, without requiring additional configuration
  - There should be no current, critical security issues
@ -50,6 +51,7 @@ For software to be included in this list, it must meet the following requirement
 	- The full source code should be released under an open source license
 	- Ideally it should be possible for the user to build and run/deploy the software themselves from source
 - **Actively Maintained**
  - The project must not be abandoned or severely out-dated 
  - The developers should address dependency updates and security patches in a timely manner
 - **Transparent**
  - It should be clear who is behind the project, what their motives are, and what (if any) the funding model is
@ -65,7 +67,7 @@ For software to be included in this list, it must meet the following requirement
 - **Mature**
  - Software needs to have a proven track record of commitment to maintenance
  - Repositories must not be newly created, and the first stable release older than 4 months
-  - Projects primarily written with AI or vibe coded are not suitable for listing here
+  - Projects that are largely AI/autogenerated without meaningful review or maintainership are not suitable for listing here
 _There may be some exceptions, but these would need to be fully justified, reviewed
 by the community, and the drawbacks / anti-features must be clearly listed along-side the software.
@ -80,21 +82,16 @@ Your pull request must follow these requirements. Failure to do so, might result
 - Do not edit the README directly when adding / editing a listing (it's auto-generated!)
 - Ensure your PR is not a duplicate, search for existing / previous submissions first
- You must respond to any comments or requests for changes in a timely manner, 14 days maximum
+- Don't forget to give the PR a title. Use the format of `Adds [software-name] to [section-name]`
 - Write short but descriptive git commit messages, under 50 characters. This must be in the format of `Adds [software-name] to [section-name]`. Your PR will be rejected if you name it `Updates README.md`
 - Only include a single addition / amendment / removal, per pull request
 - You must complete each of the sections in the [pull request template](https://github.com/Lissy93/awesome-privacy/blob/main/.github/PULL_REQUEST_TEMPLATE.md). Do not delete it!
 - Where applicable, include links to supporting material for your addition: git repo, docs, recent security audits, etc. This will make researching it much easier for reviewers
- While adding new software to the list, don't make your entry read like an advert. Be objective, and include drawbacks as well as strengths
+- Your entry should be added at the bottom of the appropriate category
- Your entry should be added at the bottom of the appropriate category, unless otherwise requested
+- Your changes must be correctly formatted, in valid yaml and which conforms to the schema
- You must be transparent about your affiliation with a product or service that you are adding. It's totally okay to submit your own projects as additions (providing they meet the requirements), but if you don't declare your association with that project then there becomes a clear conflict of interest
+- Description needs to be 50-250 characters, and must not read like an advert. Be objective, and include drawbacks as well as strengths
 - You must be transparent about your affiliation with a product or service that you are adding. It's totally okay to submit your own projects, but if you don't declare your association with that project then there becomes a clear conflict of interest
 - You must respond to any comments or requests for changes in a timely manner, 14 days maximum
 - You must adhere to the [Contributor Covenant Code of Conduct](https://github.com/Lissy93/awesome-privacy?tab=coc-ov-file#contributor-covenant-code-of-conduct)
 - Don't open a Draft / WIP pull request while you work on the guidelines. A pull request should be 100% ready and should adhere to all the above guidelines when you open it
 - Your changes must be correctly spelled, and with good grammar
 - Your changes must be correctly formatted, in valid yaml and markdown
 - The addition description must be no less than 50, and no more than 250 characters, keep it clear and to the point
 - If there are other pull requests open, please help review them before submitting yours
 - A pull request must receive multiple approval reviews before it can be merged
 ---
@ -242,6 +239,57 @@ Just look at some of the existing entries in the file for inspiration, and if yo
 ---
 ## About the Automated Pre-Review
 When you open a PR, we run a few automated checks. This was implemented so that you get helpful feedback immediately, if the submission contains a common mistake.
 Note that the pass/fail of these checks does not indicate whether a PR will or will not be merged. And if something does fail, my friendly bot will drop a comment explaining how you can fix it :)
 <details>
 	<summary>View all checks</summary>
 Below is the full list of checks - it's basically the same as what is listed in the [Contributing Guidelines](https://github.com/Lissy93/awesome-privacy/blob/main/.github/CONTRIBUTING.md#guidelines) above. Everything in red needs to pass to be merged, whereas yellow is just warnings/suggestions.
 - **PR Meta**
 	- 🔴 **Title format** - Must follow `[Add/Remove/Update] [name] in [section]`
 	- 🔴 **Template filled** - All required sections (Type, Changes, Checklist) must be present
 	- 🔴 **Checkboxes ticked** - All checklist boxes must be checked with `[x]`
 	- 🔴 **No README edits** - README is auto-generated, so direct edits are rejected
 	- 🟡 **Not a draft** - WIP/draft PRs are discouraged
 	- 🟡 **No bot authors** - Commits should not be solely authored by an AI bot
 - **Validating Addition**
 	- 🔴 **Schema valid** - YAML must pass schema validation
 	- 🔴 **Required fields** - Must include `name`, `description`, `url`, `icon`
 	- 🟡 **Single entry** - Only one service addition per PR
 	- 🟡 **Position** - New entries must go at the end of their section
 	- 🟡 **Open source** - Non-open-source submissions need justification
 	- 🟡 **Duplicate name** - Service name must not already exist
 	- 🟡 **Duplicate URL** - Service URL must not already exist
 	- 🟡 **Description length** - Should be 50–250 characters
 	- 🟡 **Open source + GitHub** - If marked open source, must include `github` field
 - **Project Health**
 	- 🟡 **Links reachable** - Service URL and icon must not return 404
 	- 🟡 **Author disclosure** - If PR author owns the repo, they should disclose it
 	- 🟡 **Not inactive** - Repo should have a push within the last 90 days
 	- 🟡 **Minimum age** - Repo should be ≥4 months old
 	- 🟡 **AI-generated code** - Flags if ≥20% of recent code came from an AI bot
 	- 🟡 **Not a fork** - Flags if the GitHub link is a fork instead of source
 	- 🟡 **Has license** - Repo should include a license
 	- 🟡 **Not archived** - Repo must not be archived
 	- 🟡 **No security alerts** - No open critical/high Dependabot alerts
 	- 🟡 **Minimum stars** - Repo should have ≥100 stars
 	- 🟡 **Spam detection** - Flags if user opened ≥5 PRs to other awesome-* repos in 24h
 - **Addition Info** (fyi only, no pass/fail requirements or warnings)
 	- 🔵 **Website check** (if has `website`) - Quickly checks for basic security requirements for website
 	- 🔵 **Source check**  (if has `github`) - Brief audit of core GitHub metrics from submitted the repo
 	- 🔵 **Android check**  (if has `android`) - Lists the trackers, permissions and stats for the Android app
 	- 🔵 **iOS check**  (if has `ios`) - Shows average rating, and app stats from the Apple App Store
 	- 🔵 **Privacy Policy check**  (if has `tosdr`) - Outputs the privacy score from ToS;DR and links to policy
 </details>
 ---
 ## Thank You
 Thank you for helping keep Awesome Privacy up-to-date! It's thanks to contributors like you that this project is possible.
--- a/.github/README.md
+++ b/.github/README.md
--- a/.github/workflows/pr-check.yml
+++ b/.github/workflows/pr-check.yml
@ -11,6 +11,7 @@ on:
 permissions:
  contents: read
  pull-requests: read
  security-events: read
 jobs:
  pr-compliance:
@ -75,6 +76,8 @@ jobs:
        run: python lib/checks/check-yaml-diff.py --base-ref ${{ github.event.pull_request.base.sha }}
      - name: Check additions
        if: steps.changes.outputs.yaml_changed == 'true'
        id: additions
        continue-on-error: true
        env:
          SCHEMA_OUTCOME: ${{ steps.schema.outcome }}
        run: python lib/checks/check-additions.py
@ -95,7 +98,7 @@ jobs:
          path: /tmp/findings-data.json
          if-no-files-found: ignore
      - name: Fail if critical
-        if: steps.changes.outputs.yaml_changed == 'true' && (steps.schema.outcome == 'failure' || steps.diff.outcome == 'failure')
+        if: steps.changes.outputs.yaml_changed == 'true' && (steps.schema.outcome == 'failure' || steps.diff.outcome == 'failure' || steps.additions.outcome == 'failure')
        run: exit 1
  submission-eligibility:
@ -121,6 +124,18 @@ jobs:
          PR_BODY: ${{ github.event.pull_request.body }}
          GITHUB_TOKEN: ${{ github.token }}
        run: python lib/checks/check-project.py
      - name: Generate repo stats
        continue-on-error: true
        env:
          GITHUB_TOKEN: ${{ github.token }}
        run: python lib/checks/make-info-stats.py
      - name: Upload repo stats
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: repo-stats
          path: /tmp/repo-stats.md
          if-no-files-found: ignore
      - name: Upload findings
        if: always()
        uses: actions/upload-artifact@v4
@ -152,6 +167,12 @@ jobs:
          name: pr-diff
          path: /tmp/artifacts
        continue-on-error: true
      - name: Download repo stats
        uses: actions/download-artifact@v4
        with:
          name: repo-stats
          path: /tmp/artifacts
        continue-on-error: true
      - name: Format comment
        env:
          PR_USER: ${{ github.event.pull_request.user.login }}
--- a/lib/awesome-privacy-readme-gen.py
+++ b/lib/awesome-privacy-readme-gen.py
@ -169,15 +169,16 @@ def makeHref(text):
    return re.sub(r'[^\w\s-]', '', text.lower()).replace(" ", "-")
 def makeContents():
-    contents = "<blockquote><details>\n"
+    contents = "<blockquote><details open>\n"
    contents += "<summary>📋 <b>Contents</b></summary>\n"
    for category in data.get('categories'):
        contents += f"\n- **{category.get('name')}**"
        for section in category.get('sections'):
-            contents += (
+            if (len(section.get('services') or []) > 0):
-                f"\n\t- [{section.get('name')}](#{makeHref(section.get('name'))}) "
+                contents += (
-                f"({len(section.get('services') or [])})"
+                    f"\n\t- [{section.get('name')}](#{makeHref(section.get('name'))}) "
                    f"({len(section.get('services') or [])})"
            )
    contents += "\n</details></blockquote>\n\n"
    return contents
@ -201,7 +202,7 @@ def makeAwesomePrivacy():
            )
          # For each service, list it's name, icon, url, and description
          for app in section.get('services') or []:
-              description, was_truncated = truncateMarkdown(app.get('description', ''))
+              description, was_truncated = truncateMarkdown(' '.join(app.get('description', '').split()))
              ap_link = (
                  f"https://awesome-privacy.xyz/"
                  f"{slugify(category.get('name'))}/{slugify(section.get('name'))}/{slugify(app.get('name'))}"
--- a/lib/checks/check-additions.py
+++ b/lib/checks/check-additions.py
@ -237,9 +237,10 @@ def check_opensource_github(diff):
 def main():
    findings = []
    critical = False
    try:
        if os.environ.get("SCHEMA_OUTCOME") == "failure":
-            findings.append(SCHEMA_MSG)
+            findings.append({"msg": SCHEMA_MSG, "level": "error"})
        diff = load_json(DIFF_PATH)
        head = load_yaml_data(DATA_PATH)
@ -251,7 +252,8 @@ def main():
            finding = check_required_fields(diff, head)
            if finding:
-                findings.append(finding)
+                findings.append({"msg": finding, "level": "error"})
                critical = True
            finding = check_position(diff, head)
            if finding:
@ -284,7 +286,7 @@ def main():
    with open(FINDINGS_PATH, "w") as f:
        json.dump(findings, f)
-    sys.exit(0)
+    sys.exit(1 if critical else 0)
 if __name__ == "__main__":
--- a/lib/checks/check-pr-meta.py
+++ b/lib/checks/check-pr-meta.py
@ -140,27 +140,29 @@ def main():
        finding = check_title(title)
        if finding:
-            findings.append(finding)
+            findings.append({"msg": finding, "level": "error"})
            critical = True
        finding = check_draft(draft)
        if finding:
            findings.append(finding)
        if not body or not body.strip():
-            findings.append(TEMPLATE_MSG)
+            findings.append({"msg": TEMPLATE_MSG, "level": "error"})
            critical = True
        else:
            finding = check_template(body)
            if finding:
-                findings.append(finding)
+                findings.append({"msg": finding, "level": "error"})
                critical = True
            finding = check_checkboxes(body)
            if finding:
-                findings.append(finding)
+                findings.append({"msg": finding, "level": "error"})
                critical = True
        finding = check_readme(readme_failed)
        if finding:
-            findings.append(finding)
+            findings.append({"msg": finding, "level": "error"})
    except Exception:
        pass
--- a/lib/checks/check-project.py
+++ b/lib/checks/check-project.py
@ -29,8 +29,8 @@ AI_BOT_AUTHORS = [
 SPAM_PR_THRESHOLD = 5
 LINK_MSG = (
-    "Our automated checks were unable to verify the link(s) you included"
+    "The link(s) you included seem to be returning a 404."
-    " were reachable, so please double check this yourself"
+    " Please double check all URLs listed are valid and publicly accessible"
 )
 AUTHOR_MSG = (
    "Looks like you are the author of this package. Please ensure that you"
--- a/lib/checks/format-comment.py
+++ b/lib/checks/format-comment.py
@ -3,6 +3,7 @@
 import json
 import os
 import sys
 from datetime import datetime, timezone
 ARTIFACTS_DIR = "/tmp/artifacts"
 OUTPUT_DIR = "/tmp/pr-meta"
@ -10,6 +11,17 @@ OUTPUT_DIR = "/tmp/pr-meta"
 REPO_URL = "https://github.com/Lissy93/awesome-privacy"
 CONTRIBUTING = f"{REPO_URL}/blob/main/.github/CONTRIBUTING.md"
 DIFF_SUMMARY_PATH = os.path.join(ARTIFACTS_DIR, "pr-diff-summary.md")
 REPO_STATS_PATH = os.path.join(ARTIFACTS_DIR, "repo-stats.md")
 def load_repo_stats():
    """Load the repo stats markdown, or None if unavailable."""
    try:
        with open(REPO_STATS_PATH) as f:
            content = f.read().strip()
            return content if content else None
    except Exception:
        return None
 def load_findings(filename):
@ -22,13 +34,28 @@ def load_findings(filename):
        return []
 def normalize_finding(f):
    """Return {"msg": str, "level": str} from a dict or plain string."""
    if isinstance(f, dict):
        return {"msg": str(f.get("msg", "")), "level": f.get("level", "warning")}
    return {"msg": str(f), "level": "warning"}
 def collect_findings():
-    """Gather all findings in display order: compliance, data, project."""
+    """Gather all findings, split into (errors, warnings) lists of message strings."""
-    all_findings = []
+    raw = []
-    all_findings.extend(load_findings("findings-compliance.json"))
+    raw.extend(load_findings("findings-compliance.json"))
-    all_findings.extend(load_findings("findings-data.json"))
+    raw.extend(load_findings("findings-data.json"))
-    all_findings.extend(load_findings("findings-project.json"))
+    raw.extend(load_findings("findings-project.json"))
-    return all_findings
+    errors = []
    warnings = []
    for f in raw:
        normalized = normalize_finding(f)
        if normalized["level"] == "error":
            errors.append(normalized["msg"])
        else:
            warnings.append(normalized["msg"])
    return errors, warnings
 def load_diff_summary():
@ -41,7 +68,18 @@ def load_diff_summary():
        return None
-def format_comment(findings, user, changes_summary, run_id):
+def _extract_changes_bullets(diff_summary):
    """Re-format bullet lines from the diff summary with a blue circle prefix."""
    if not diff_summary:
        return None
    bullets = []
    for line in diff_summary.splitlines():
        if line.startswith("- "):
            bullets.append(f"- \U0001f535 {line[2:]}")
    return "\n".join(bullets) if bullets else None
 def format_comment(findings, user, changes_summary, run_id, repo_stats=None):
    """Build the markdown comment."""
    parts = [
        f"<!-- pr-check-bot -->\nHello @{user}\n",
@ -59,7 +97,7 @@ def format_comment(findings, user, changes_summary, run_id):
            f"But a human will review your submission shortly!"
        )
    else:
-        parts.append("> ✅ All our automated checks have passed.")
+        parts.append("> \u2705 All our automated checks have passed.")
    if changes_summary:
        parts.append(
@ -67,6 +105,12 @@ def format_comment(findings, user, changes_summary, run_id):
            f"{changes_summary}\n</details>"
        )
    if repo_stats:
        parts.append(
            f"<details><summary>Submission Info</summary>\n\n"
            f"{repo_stats}\n</details>"
        )
    if run_id:
        parts.append(
            f'<sup>For full details, please see workflow run '
@ -76,18 +120,80 @@ def format_comment(findings, user, changes_summary, run_id):
    return "\n\n".join(parts) + "\n"
-def write_step_summary(findings):
+def write_step_summary(errors, warnings, user, pr_number, run_id, changes_summary,
-    """Write a summary to GITHUB_STEP_SUMMARY."""
+                       repo_stats=None):
    """Write a structured summary to GITHUB_STEP_SUMMARY."""
    summary_file = os.environ.get("GITHUB_STEP_SUMMARY")
    if not summary_file:
        return
-    lines = ["## PR Check Summary\n"]
+
-    if findings:
+    lines = ["## Status Check Results\n"]
-        lines.append(f"⚠️ Found {len(findings)} issue(s):\n")
+
-        for f in findings:
+    # Summary sentence
-            lines.append(f"- {f}")
+    ne, nw = len(errors), len(warnings)
    lines.append("### Summary\n")
    if ne and nw:
        lines.append(
            f"There are {ne} error(s) which must be resolved before this PR can be"
            f" reviewed, as well as {nw} warning(s) which need to be addressed or"
            f" justified.\n"
        )
    elif ne:
        lines.append(
            f"There are {ne} error(s) which must be resolved before this PR can be"
            f" reviewed.\n"
        )
    elif nw:
        lines.append(
            f"There were no errors but {nw} warning(s) which need to be addressed"
            f" or justified before the PR can be merged.\n"
        )
    else:
-        lines.append("✅ All checks passed.\n")
+        lines.append(
            "All checks are passing, with no errors and no warnings \U0001f389\n"
            "A maintainer has been notified, and will review the submission shortly.\n"
        )
    # Errors
    lines.append("### Errors\n")
    if errors:
        for e in errors:
            lines.append(f"- \U0001f534 {e}")
    else:
        lines.append("\u2705 None")
    lines.append("")
    # Warnings
    lines.append("### Warnings\n")
    if warnings:
        for w in warnings:
            lines.append(f"- \U0001f7e1 {w}")
    else:
        lines.append("\u2705 None")
    lines.append("")
    # Meta Info
    lines.append("### Meta Info\n")
    now = datetime.now(timezone.utc)
    timestamp = now.strftime("%H:%M UTC on %d %b %Y")
    if pr_number:
        lines.append(
            f"This workflow run was triggered at {timestamp}"
            f" for PR #{pr_number} which was opened by @{user}\n"
        )
    else:
        lines.append(
            f"This workflow run was triggered at {timestamp} by @{user}\n"
        )
    if changes_summary:
        lines.append("The PR introduces the following changes:\n")
        lines.append(f"{changes_summary}\n")
    if repo_stats:
        lines.append("#### Submission Info\n")
        lines.append(f"{repo_stats}\n")
    with open(summary_file, "a") as f:
        f.write("\n".join(lines) + "\n")
@ -107,13 +213,17 @@ def main():
            with open(os.path.join(OUTPUT_DIR, "run-id.txt"), "w") as f:
                f.write(run_id)
-        findings = collect_findings()
+        errors, warnings = collect_findings()
        all_findings = errors + warnings
        with open(os.path.join(OUTPUT_DIR, "findings-count.txt"), "w") as f:
-            f.write(str(len(findings)))
+            f.write(str(len(all_findings)))
        changes_summary = load_diff_summary()
-        write_step_summary(findings)
+        changes_bullets = _extract_changes_bullets(changes_summary)
        repo_stats = load_repo_stats()
        write_step_summary(errors, warnings, user, pr_number, run_id, changes_bullets,
                           repo_stats)
-        comment = format_comment(findings, user, changes_summary, run_id)
+        comment = format_comment(all_findings, user, changes_summary, run_id, repo_stats)
        with open(os.path.join(OUTPUT_DIR, "comment.md"), "w") as f:
            f.write(comment)
    except Exception:
--- a/lib/checks/make-info-stats.py
+++ b/lib/checks/make-info-stats.py
@ -0,0 +1,639 @@
 """
 This fetches info about a project/service which is being submitted.
 It's used when a PR is open, to show some additional context.
 Everything fetched here, is basically just a sneak peek of
 what will be fetched by the main awesome-privacy.xyz website
 once this submission is deployed. And it uses all the same endpoints.
 It covers (where applicable) the following look ups:
    - Repo - basic community checks
    - Website - security sanity checks
    - Android app - permissions, trackers, meta
    - iOS app - reviews, and meta info
    - Privacy policy - overall grade, link (if tosdr)
 The output is in markdown, and has some color grading with circle emojis.
 This is not a pass/fail check, and is not required for a PR to get merged.
 It just adds a bit of context, to make reviewing it a tiny bit quicker!
 Excuse the code, it's a bit scrappy! But it's never used in the prod app.
 """
 import argparse
 import json
 import os
 import sys
 from datetime import datetime, timezone
 from urllib.parse import urlparse
 import requests
 DIFF_PATH = "/tmp/pr-diff.json"
 OUTPUT_PATH = "/tmp/repo-stats.md"
 TIMEOUT = 10
 USER_AGENT = "awesome-privacy-ci/1.0"
 AI_BOT_AUTHORS = [
    "noreply@anthropic.com",
    "devin-ai-integration[bot]",
    "copilot-swe-agent.github.com",
    "noreply@cursor.com",
 ]
 RESTRICTIVE_LICENSES = {
    "AGPL-3.0-only", "AGPL-3.0-or-later", "SSPL-1.0", "BSL-1.0", "BUSL-1.1",
 }
 SITE_INFO_URL = "https://site-info-fetch.as93.workers.dev"
 ANDROID_API_URL = "https://android-app-privacy.as93.net"
 IOS_API_URL = "https://ios-app-info.as93.net"
 TOSDR_API_URL = "https://privacy-policies.as93.workers.dev"
 GREEN, ORANGE, RED, BLUE, WHITE = "\U0001f7e2", "\U0001f7e0", "\U0001f534", "\U0001f535", "\u26aa"
 def _api_get(url, params=None, timeout=TIMEOUT, headers=None):
    """GET a URL, return parsed JSON on 200, else None."""
    hdrs = {"User-Agent": USER_AGENT}
    if headers:
        hdrs.update(headers)
    try:
        resp = requests.get(url, headers=hdrs, timeout=timeout, params=params)
        if resp.status_code == 200:
            return resp.json()
    except Exception as e:
        print(f"Fetch failed for {url}: {e}", file=sys.stderr)
    return None
 def relative_time(iso_str):
    """Convert ISO timestamp to human-readable relative time, or None."""
    if not iso_str:
        return None
    try:
        dt = datetime.fromisoformat(str(iso_str).replace("Z", "+00:00"))
        days = (datetime.now(timezone.utc) - dt).days
        if days < 1:
            return "today"
        if days < 7:
            return f"{days} day{'s' if days != 1 else ''}"
        if days < 30:
            w = days // 7
            return f"{w} week{'s' if w != 1 else ''}"
        if days < 365:
            m = days // 30
            return f"{m} month{'s' if m != 1 else ''}"
        y, rm = days // 365, (days % 365) // 30
        s = f"{y} year{'s' if y != 1 else ''}"
        return f"{s}, {rm} month{'s' if rm != 1 else ''}" if rm else s
    except Exception:
        return None
 def _days_since(iso_str):
    """Return number of days since an ISO timestamp, or None."""
    if not iso_str:
        return None
    try:
        dt = datetime.fromisoformat(iso_str.replace("Z", "+00:00"))
        return (datetime.now(timezone.utc) - dt).days
    except Exception:
        return None
 def _friendly_date(iso_str):
    """Return relative time string with 'ago' suffix, falling back to raw string."""
    if not iso_str:
        return None
    rt = relative_time(iso_str)
    if rt is None:
        return str(iso_str)
    return rt if rt == "today" else f"{rt} ago"
 def _format_bytes(n):
    """Format bytes to human-readable size."""
    try:
        n = int(n)
    except (TypeError, ValueError):
        return None
    for unit, threshold in [("GB", 1e9), ("MB", 1e6), ("KB", 1e3)]:
        if n >= threshold:
            return f"{n / threshold:.1f} {unit}"
    return f"{n} B"
 def _info_or_unknown(label, value):
    """Return a blue info stat, or white Unknown if value is falsy."""
    return (BLUE, label, value) if value else (WHITE, label, "Unknown")
 def format_markdown(stats):
    """Format graded stats as markdown bullet list."""
    return "\n".join(f"- {emoji} **{label}:** {value}" for emoji, label, value in stats)
 def parse_github_field(value):
    """Parse 'owner/repo' or full URL into (owner, repo) or (None, None)."""
    if not value:
        return None, None
    if value.startswith("https://github.com/"):
        parts = value.removeprefix("https://github.com/").strip("/").split("/")
        if len(parts) >= 2:
            return parts[0], parts[1]
        return None, None
    if "/" in value:
        parts = value.split("/")
        if len(parts) == 2:
            return parts[0], parts[1]
    return None, None
 def gh_get(path, token, params=None):
    """GET a GitHub API endpoint. Returns JSON on 200, else None."""
    headers = {"Accept": "application/vnd.github.v3+json"}
    if token:
        headers["Authorization"] = f"token {token}"
    return _api_get(f"https://api.github.com{path}", params=params, headers=headers)
 def fetch_all_data(owner, repo, token):
    """Fetch all repo data. Returns dict or None if main repo call fails."""
    base = gh_get(f"/repos/{owner}/{repo}", token)
    if not base:
        return None
    data = {
        "license": base.get("license"),
        "created_at": base.get("created_at"),
        "pushed_at": base.get("pushed_at"),
        "stars": base.get("stargazers_count", 0),
        "fork": base.get("fork", False),
        "archived": base.get("archived", False),
        "homepage": base.get("homepage"),
        "owner": base.get("owner", {}).get("login"),
        "open_issues_count": base.get("open_issues_count", 0),
    }
    releases = gh_get(f"/repos/{owner}/{repo}/releases", token, {"per_page": 11})
    data["release_count"] = len(releases) if releases is not None else None
    contributors = gh_get(
        f"/repos/{owner}/{repo}/contributors", token, {"per_page": 11, "anon": "true"},
    )
    data["contributor_count"] = len(contributors) if contributors is not None else None
    commits = gh_get(f"/repos/{owner}/{repo}/commits", token, {"per_page": 100})
    if commits is not None:
        bot_set = {a.lower() for a in AI_BOT_AUTHORS}
        ai_count = 0
        for c in commits:
            author = c.get("commit", {}).get("author", {})
            email = (author.get("email") or "").lower()
            name = (author.get("name") or "").lower()
            if email in bot_set or name in bot_set:
                ai_count += 1
                continue
            message = (c.get("commit", {}).get("message") or "").lower()
            for line in message.splitlines():
                if line.strip().startswith("co-authored-by:"):
                    if any(bot in line for bot in bot_set):
                        ai_count += 1
                        break
        data["commit_count"] = len(commits)
        data["ai_commit_count"] = ai_count
    else:
        data["commit_count"] = None
        data["ai_commit_count"] = None
    alerts = gh_get(
        f"/repos/{owner}/{repo}/dependabot/alerts", token,
        {"state": "open", "severity": "critical,high", "per_page": 1},
    )
    data["has_security_alerts"] = bool(alerts) if alerts is not None else None
    languages = gh_get(f"/repos/{owner}/{repo}/languages", token)
    data["languages"] = list(languages.keys()) if languages is not None else None
    return data
 def grade_stats(data):
    """Grade repo stats, returning list of (emoji, label, value_str) tuples."""
    stats = []
    lic = data.get("license")
    if not lic:
        stats.append((RED, "License", "Missing"))
    else:
        spdx = lic.get("spdx_id", "")
        if spdx == "NOASSERTION":
            stats.append((WHITE, "License", "Unknown"))
        elif spdx in RESTRICTIVE_LICENSES:
            stats.append((ORANGE, "License", spdx))
        else:
            stats.append((GREEN, "License", lic.get("name") or spdx or "Present"))
    age_days = _days_since(data.get("created_at"))
    age_str = relative_time(data.get("created_at"))
    if age_days is None:
        stats.append((WHITE, "Repo Age", "Unknown"))
    elif age_days >= 730:
        stats.append((GREEN, "Repo Age", age_str))
    elif age_days >= 180:
        stats.append((ORANGE, "Repo Age", age_str))
    else:
        stats.append((RED, "Repo Age", age_str))
    updated_days = _days_since(data.get("pushed_at"))
    updated_str = _friendly_date(data.get("pushed_at"))
    if updated_days is None:
        stats.append((WHITE, "Last Updated", "Unknown"))
    elif updated_days <= 7:
        stats.append((GREEN, "Last Updated", updated_str))
    elif updated_days <= 56:
        stats.append((ORANGE, "Last Updated", updated_str))
    else:
        stats.append((RED, "Last Updated", updated_str))
    rc = data.get("release_count")
    if rc is None:
        stats.append((WHITE, "Releases", "Unknown"))
    elif rc >= 10:
        stats.append((GREEN, "Releases", f"{rc}+" if rc >= 11 else str(rc)))
    elif rc >= 1:
        stats.append((ORANGE, "Releases", str(rc)))
    else:
        stats.append((RED, "Releases", "0"))
    stars = data.get("stars")
    if stars is None:
        stats.append((WHITE, "Stars", "Unknown"))
    elif stars >= 1000:
        stats.append((GREEN, "Stars", f"{stars:,}"))
    elif stars >= 100:
        stats.append((ORANGE, "Stars", f"{stars:,}"))
    else:
        stats.append((RED, "Stars", f"{stars:,}"))
    cc = data.get("contributor_count")
    if cc is None:
        stats.append((WHITE, "Contributors", "Unknown"))
    elif cc >= 10:
        stats.append((GREEN, "Contributors", f"{cc}+" if cc >= 11 else str(cc)))
    elif cc >= 3:
        stats.append((ORANGE, "Contributors", str(cc)))
    else:
        stats.append((RED, "Contributors", str(cc)))
    fork = data.get("fork")
    if fork is None:
        stats.append((WHITE, "Is Fork", "Unknown"))
    else:
        stats.append((ORANGE if fork else GREEN, "Is Fork", "Yes" if fork else "No"))
    archived = data.get("archived")
    if archived is None:
        stats.append((WHITE, "Is Archived", "Unknown"))
    else:
        stats.append((RED if archived else GREEN, "Is Archived", "Yes" if archived else "No"))
    alerts = data.get("has_security_alerts")
    if alerts is None:
        stats.append((WHITE, "Security Alerts", "Unknown"))
    elif alerts:
        stats.append((RED, "Security Alerts", "Open critical/high alerts"))
    else:
        stats.append((GREEN, "Security Alerts", "None"))
    ai = data.get("ai_commit_count")
    if ai is None:
        stats.append((WHITE, "Vibe Coded", "Unknown"))
    elif ai >= 20:
        stats.append((RED, "Vibe Coded", f"{ai} AI commits"))
    elif ai >= 1:
        stats.append((ORANGE, "Vibe Coded", f"{ai} AI commit{'s' if ai != 1 else ''}"))
    else:
        stats.append((GREEN, "Vibe Coded", "0 AI commits"))
    commit_count = data.get("commit_count")
    if commit_count is None:
        stats.append((WHITE, "Commits", "Unknown"))
    else:
        stats.append((BLUE, "Commits", f"{commit_count:,}+" if commit_count >= 100 else f"{commit_count:,}"))
    issues = data.get("open_issues_count")
    stats.append((BLUE, "Open Issues", f"{issues:,}") if issues is not None else (WHITE, "Open Issues", "Unknown"))
    stats.append(_info_or_unknown("Website", data.get("homepage")))
    stats.append(_info_or_unknown("Author", data.get("owner")))
    langs = data.get("languages")
    stats.append(_info_or_unknown("Languages", ", ".join(langs) if langs else None))
    return stats
 def fetch_website_data(url):
    """Fetch site info from the worker API."""
    return _api_get(SITE_INFO_URL, params={"url": url}, timeout=15)
 def check_security_txt(url):
    """Check for a valid security.txt. Returns True/False/None on error."""
    parsed = urlparse(url)
    base = f"{parsed.scheme}://{parsed.netloc}"
    for path in ("/.well-known/security.txt", "/security.txt"):
        try:
            resp = requests.get(
                base + path, headers={"User-Agent": USER_AGENT},
                timeout=TIMEOUT, allow_redirects=True,
            )
            if resp.status_code == 200 and "contact:" in resp.text.lower():
                return True
        except Exception:
            continue
    try:
        requests.head(base, headers={"User-Agent": USER_AGENT}, timeout=TIMEOUT)
        return False
    except Exception:
        return None
 def _header_present(data, key):
    """Check if a response header is present. Returns GREEN/RED/WHITE tuple helper."""
    if not data:
        return WHITE
    val = data.get("response_headers", {}).get(key)
    if key == "content_security_policy" and not val:
        val = data.get("response_headers", {}).get("content_security_policy_report_only")
    return GREEN if val else RED
 def grade_website_stats(data, url, has_security_txt):
    """Grade website stats."""
    stats = []
    code = data.get("response_headers", {}).get("code") if data else None
    if code is None:
        stats.append((WHITE, "Status", "Unknown"))
    elif 200 <= code < 300:
        stats.append((GREEN, "Status", str(code)))
    elif 300 <= code < 400:
        stats.append((ORANGE, "Status", str(code)))
    else:
        stats.append((RED, "Status", str(code)))
    stats.append((GREEN, "HTTPS", "Yes") if url.startswith("https://") else (RED, "HTTPS", "No"))
    bl = data.get("domain_blacklist", {}) if data else {}
    detections = bl.get("detections") if isinstance(bl, dict) else None
    if detections is None:
        stats.append((WHITE, "Blacklist", "Unknown"))
    elif detections == 0:
        stats.append((GREEN, "Blacklist", "Not listed"))
    else:
        stats.append((RED, "Blacklist", f"{detections} detection(s)"))
    redir = data.get("redirection", {}) if data else {}
    if not isinstance(redir, dict):
        redir = {}
    found, external = redir.get("found"), redir.get("external")
    if found is None:
        stats.append((WHITE, "Redirect", "Unknown"))
    elif not found:
        stats.append((GREEN, "Redirect", "None"))
    elif external:
        stats.append((RED, "Redirect", "External redirect"))
    else:
        stats.append((ORANGE, "Redirect", "Internal redirect"))
    risk_data = data.get("risk_score", {}) if data else {}
    risk = risk_data.get("result") if isinstance(risk_data, dict) else None
    if risk is None:
        stats.append((WHITE, "Risk Score", "Unknown"))
    elif risk == 0:
        stats.append((GREEN, "Risk Score", "0"))
    elif risk <= 5:
        stats.append((ORANGE, "Risk Score", str(risk)))
    else:
        stats.append((RED, "Risk Score", str(risk)))
    for key, label in [("strict_transport_security", "HSTS"),
                        ("content_security_policy", "CSP"),
                        ("x_frame_options", "X-Frame-Options")]:
        emoji = _header_present(data, key)
        stats.append((emoji, label, "Present" if emoji == GREEN else "Missing" if emoji == RED else "Unknown"))
    if has_security_txt is None:
        stats.append((WHITE, "Security.txt", "Unknown"))
    else:
        stats.append((GREEN if has_security_txt else RED, "Security.txt",
                       "Present" if has_security_txt else "Missing"))
    sd = data.get("server_details", {}) if data else {}
    if not isinstance(sd, dict):
        sd = {}
    server_parts = [v for k in ("ip", "country", "asn") if (v := sd.get(k))]
    stats.append(_info_or_unknown("Server", ", ".join(server_parts) if server_parts else None))
    loc_parts = [v for k in ("city_name", "region_name", "country_name") if (v := sd.get(k))]
    stats.append(_info_or_unknown("Server Location", ", ".join(loc_parts) if loc_parts else None))
    title = None
    if data and isinstance(data.get("web_page"), dict):
        title = data["web_page"].get("title")
    stats.append(_info_or_unknown("Title", title))
    return stats
 def fetch_android_data(package_id):
    """Fetch Android app privacy data."""
    package_id = package_id.split("id=")[-1] if "id=" in package_id else package_id
    data = _api_get(f"{ANDROID_API_URL}/{package_id}")
    return data if data and not data.get("error") else None
 def grade_android_stats(data):
    """Grade Android app stats."""
    stats = []
    trackers = data.get("trackers")
    if trackers is None:
        stats.append((WHITE, "Trackers", "Unknown"))
    else:
        n = len(trackers)
        stats.append((GREEN if n == 0 else ORANGE if n <= 2 else RED, "Trackers", str(n)))
    perms = data.get("permissions")
    if perms is None:
        stats.append((WHITE, "Permissions", "Unknown"))
    else:
        n = len(perms)
        stats.append((GREEN if n <= 2 else ORANGE if n <= 10 else RED, "Permissions", str(n)))
    stats.append(_info_or_unknown("Downloads", data.get("downloads")))
    stats.append(_info_or_unknown("Created", _friendly_date(data.get("created"))))
    stats.append(_info_or_unknown("Last Updated", _friendly_date(data.get("updated"))))
    return stats
 def fetch_ios_data(app_url):
    """Fetch iOS app info."""
    return _api_get(IOS_API_URL, params={"appStoreUrl": app_url})
 def grade_ios_stats(data):
    """Grade iOS app stats."""
    stats = []
    rating = data.get("averageUserRating")
    if rating is None:
        stats.append((WHITE, "Rating", "Unknown"))
    else:
        stats.append((GREEN if rating >= 4.5 else ORANGE if rating >= 3.5 else RED,
                       "Rating", f"{rating:.1f} / 5"))
    stats.append(_info_or_unknown("Created", _friendly_date(data.get("releaseDate"))))
    stats.append(_info_or_unknown("Last Updated", _friendly_date(data.get("currentVersionReleaseDate"))))
    stats.append(_info_or_unknown("Size", _format_bytes(data.get("fileSizeBytes"))))
    return stats
 def fetch_tosdr_data(service_id):
    """Fetch ToS;DR privacy policy data."""
    return _api_get(f"{TOSDR_API_URL}/{service_id}")
 def grade_tosdr_stats(data):
    """Grade ToS;DR privacy policy stats."""
    stats = []
    params = data.get("parameters") or {}
    rating = params.get("rating")
    if not rating:
        stats.append((WHITE, "Score", "Unknown"))
    else:
        r = str(rating).upper()
        stats.append((GREEN if r == "A" else ORANGE if r in ("B", "C") else RED,
                       "Score", f"Grade {r}"))
    doc_url = None
    docs = params.get("documents")
    if docs and isinstance(docs, list) and isinstance(docs[0], dict):
        doc_url = docs[0].get("url")
    stats.append(_info_or_unknown("Privacy Policy", doc_url))
    return stats
 def _resolve_args(argv):
    """Return dict with keys: owner, repo, url, android, ios, tosdr. All optional."""
    parser = argparse.ArgumentParser(description="Generate submission info stats")
    parser.add_argument("--repo", default=None, help="owner/repo")
    parser.add_argument("--url", default=None, help="Website URL to check")
    parser.add_argument("--android", default=None, help="Android package ID")
    parser.add_argument("--ios", default=None, help="iOS App Store URL")
    parser.add_argument("--tosdr", default=None, help="ToS;DR service ID")
    args = parser.parse_args(argv[1:])
    result = {"owner": None, "repo": None, "url": args.url,
              "android": args.android, "ios": args.ios, "tosdr": args.tosdr}
    if any(vars(args).values()):
        if args.repo:
            owner, repo = parse_github_field(args.repo)
            if not owner:
                print(f"Invalid repo format: {args.repo}", file=sys.stderr)
                sys.exit(1)
            result["owner"], result["repo"] = owner, repo
        return result
    # CI mode: extract from diff file
    try:
        with open(DIFF_PATH) as f:
            diff = json.load(f)
    except Exception:
        print("No arguments and no diff file found", file=sys.stderr)
        sys.exit(0)
    field_map = {"github": "owner", "url": "url", "androidApp": "android",
                 "iosApp": "ios", "tosdrId": "tosdr"}
    for svc in diff.get("services", {}).get("added", []):
        fields = svc.get("fields", {})
        for yaml_key, result_key in field_map.items():
            if not result.get(result_key) and fields.get(yaml_key):
                if yaml_key == "github":
                    result["owner"], result["repo"] = parse_github_field(fields[yaml_key])
                else:
                    result[result_key] = str(fields[yaml_key])
        break  # only first added service
    if not any(result.values()):
        print("No checkable fields found in diff", file=sys.stderr)
        sys.exit(0)
    return result
 def main():
    try:
        args = _resolve_args(sys.argv)
        cli_mode = len(sys.argv) > 1
        sections = []
        if args["owner"] and args["repo"]:
            token = os.environ.get("GITHUB_TOKEN", "")
            data = fetch_all_data(args["owner"], args["repo"], token)
            if data:
                sections.append(("Repo Stats", format_markdown(grade_stats(data))))
            else:
                print(f"Failed to fetch repo data for {args['owner']}/{args['repo']}", file=sys.stderr)
        if args["url"]:
            site_data = fetch_website_data(args["url"])
            has_sec_txt = check_security_txt(args["url"])
            if site_data or has_sec_txt is not None:
                sections.append(("Website Checks",
                                 format_markdown(grade_website_stats(site_data, args["url"], has_sec_txt))))
        if args["android"]:
            data = fetch_android_data(args["android"])
            if data:
                sections.append(("Android App", format_markdown(grade_android_stats(data))))
        if args["ios"]:
            data = fetch_ios_data(args["ios"])
            if data:
                sections.append(("iOS App", format_markdown(grade_ios_stats(data))))
        if args["tosdr"]:
            data = fetch_tosdr_data(args["tosdr"])
            if data:
                sections.append(("Privacy Policy", format_markdown(grade_tosdr_stats(data))))
        if not sections:
            sys.exit(0)
        md_parts = []
        for heading, body in sections:
            md_parts.append(f"#### {heading}\n{body}")
        md = "\n\n".join(md_parts)
        md += "\n\n<sup>The above data does not determine a submissions eligibility. Human review is still needed.</sup>\n"
        md += "<sup><b>Key:</b> 🟢 = good. 🟠 = warning. 🔴 = attention required. 🔵 = info. ⚪ = unknown. </sup>\n\n"
        if cli_mode:
            print(md)
        else:
            with open(OUTPUT_PATH, "w") as f:
                f.write(md + "\n")
            print(f"Stats written to {OUTPUT_PATH}")
    except Exception as e:
        print(f"make-info-stats failed: {e}", file=sys.stderr)
    sys.exit(0)
 if __name__ == "__main__":
    main()