Methodology
EventsPress is a deterministic, event-based news tracker. We ingest only public feeds (RSS, Atom, or JSON) and never use AI-generated facts.
Collection
- Sources are defined in a versioned
sources.jsonconfig. - The worker fetches feeds every 15 minutes with ETag/Last-Modified headers.
- We store only metadata: title, URL, timestamps, source ID, and short feed excerpts.
Canonicalization
- URLs are normalized by removing tracking parameters (
utm_*,fbclid,gclid). - Canonical URLs drive primary clustering so identical links always map to the same cluster.
Clustering
We use a deterministic, two-stage clusterer:
- Primary match: identical canonical URLs are grouped together.
- Secondary match: titles are normalized (lowercased, punctuation removed, stopwords stripped). We compute a token-level Jaccard similarity and group items if similarity is ≥ 0.6 within a ±12 hour window.
Delta computation
- Daily snapshots store cluster IDs with last-seen timestamps and source counts.
- A daily delta compares today vs. yesterday.
- Rolling 24h delta marks clusters first seen in the last 24 hours as new, and clusters with refreshed sightings in the last 24 hours as updated.
Corrections
Corrections are logged publicly and never overwrite history without a note. See the corrections page for the log.