bewcloud/lib
不做了睡大觉 1aca444b22
fix: properly strip HTML tags and resolve entities in feed article summaries (#149)
* fix: properly strip HTML tags and resolve entities in feed article summaries

Fixes #146

The parseTextFromHtml function was using document.textContent directly on
the parsed HTML document, which could leave raw HTML tags and unresolved
entities in feed article summaries.

Changes:
- Extract text from body element to avoid document wrapper artifacts
- Collapse multiple whitespace/newlines into single spaces for cleaner output
- Add early return for empty/whitespace-only input
- Use optional chaining for safer null handling

* fix: preserve single line breaks, only collapse 2+ consecutive whitespace

Address review feedback: the previous \s+ regex was too aggressive and
broke text-only summaries with legitimate line breaks.

Now:
- Collapse runs of 2+ non-newline whitespace into a single space
- Collapse 3+ consecutive newlines into double newline (paragraph break)
- Single line breaks are preserved

---------

Co-authored-by: User <user@example.com>
2026-02-23 17:29:09 +00:00
..
interfaces Update all dependencies 2025-09-27 19:39:09 +01:00
models Remove fresh 2026-02-20 10:54:31 +00:00
utils Remove fresh 2026-02-20 10:54:31 +00:00
auth.ts Remove fresh 2026-02-20 10:54:31 +00:00
config.ts Remove fresh 2026-02-20 10:54:31 +00:00
feed.ts fix: properly strip HTML tags and resolve entities in feed article summaries (#149) 2026-02-23 17:29:09 +00:00
page.ts Remove fresh 2026-02-20 10:54:31 +00:00
types.ts Remove fresh 2026-02-20 10:54:31 +00:00