Where the signal comes from.
Every market on PredictHanta resolves against a primary source listed below. This page is also the operational manifest for the ETL workers in lib/sources.ts — each entry tells the parser what cadence to poll and which method (RSS, JSON, HTML, headless browser) to use.
Official authorities · 15
WHO · ECDC · PAHO · CDC · national reference labsRSS at /rss-feeds/disease-outbreak-news.xml. Filter items containing 'hantavirus' or DON#599. Each item links to a permalink with structured 'Situation at a glance' / 'Epidemiology' / 'WHO risk assessment' sections — scrape with cheerio.
Legacy CSR RSS. Often duplicates DON but lower latency for some events.
Annual epidemiological reports + Communicable Disease Threats Report (CDTR) released every Friday — PDF; link is on the landing page. Use a PDF text extractor (pdfjs) and look for 'Hantavirus' section.
Weekly PDF + RSS. Parse PDF for any 'hantavirus' / 'Andes' string.
Useful for Argentina/Chile case counts; PAHO often publishes before WHO HQ. Check /sites/default/files/*.pdf weekly digests.
RSS at emergency.cdc.gov/han/rss.asp. CDC issues HAN alerts ahead of MMWR for travel-related imports.
RSS at /mmwr/rss/mmwr.xml. Authoritative for any US-diagnosed case.
Cumulative US HPS case count is updated periodically; scrape the 'Reported Cases' page for state breakdown.
Confirmed the Zürich Andes case. RSS available; Swiss notifiable-disease (NotI-MS) data is published weekly as CSV.
Hantavirus is notifiable in Germany. SurvStat exposes weekly case counts as CSV via parametrized URL — wrap as a small ETL.
Weekly HFRS surveillance summary; mainly Puumala in NE France but tracks Andes imports.
HFRS surveillance. Translate page text before regex matching.
Critical source — Argentina is the natural reservoir of Andes virus. Weekly bulletins are PDFs.
Andes virus surveillance in Aysén / Magallanes regions.
Useful for cruise-ship public health declarations; requires login. Pair with maritime AIS feed for MV Hondius position.
Live trackers · 5
Independent dashboards aggregating the clusterStatic JSON published next to the dashboard (the page literally states 'Mode: Static JSON'). Sniff the network tab for a /data/*.json endpoint and consume directly.
Independent open-data project. Pages: /ship-tracker, /risk-info, /rodent-map, /news. As of May 8: 5 lab-confirmed, 3 suspected, 3 deaths, 147 on board, 23 nationalities. Has an email-alert endpoint you can probe for an underlying webhook.
Sits behind Cloudflare (returns HTTP 403 to bare GETs). Use Playwright or a residential proxy + UA spoof. Worth scraping for their map heat overlay.
Boston Children's Hospital outbreak aggregator. Has an undocumented /maintainer JSON endpoint; `disease=Hantavirus` filter works in the UI URL.
ISID-curated outbreak intelligence. Subscribe to the Hantavirus moderator feed; archive search via /promed-post/?id=. Prose-heavy — extract location with NER.
Registries · 3
Trials, taxonomy, sequencesv2 API returns JSON. Resolves the vaccine-Phase-2 market. Filter `statusModule.overallStatus` and `phase`.
No JSON API; scrape result table.
Authoritative for species naming (Orthohantavirus dobravaense, andesense, etc.).
Scientific feeds · 5
PubMed · pre-prints · sequence reposesearch + esummary returns JSON with retmode=json. Sort by pubdate. Cross-link DOIs with EuropePMC.
Open API, includes pre-prints from bioRxiv/medRxiv mirrored.
Pre-prints — first place sequence/transmission claims appear.
Sequence sharing platform — requires registration and DUA for data download.
JSON behind the table. Useful for new sequences uploaded.
Press & wires · 5
Front-page resolution sourcesRobert Herriman's blog — fast on Latin-American zoonotic stories.
Cheap sentiment + earliest-mover for translated regional press. Run a dedup hash on titles + canonical URL.
Global news ingestion — geocoded, multilingual. Best signal for non-English regional outbreaks.
Wire-grade source; resolves the NYT/major-media markets.
Same role as Reuters; AP often publishes WHO embargoes first.
Social signal · 3
Lowest-trust, fastest signalRules: `(hantavirus OR "andes virus" OR "MV Hondius") -is:retweet lang:en|es|de|fr`. Requires Basic tier; degrade to nitter scraping if needed.
Public JSON. Subreddit `r/Outbreaks` is the highest signal-to-noise.
Public channel previews are HTML-scrapable at /s/<channel>. Useful for African / SE-Asian early signal.
Suggested ETL topology
- Ingest tier — one worker per
parsemethod. RSS / JSON workers run on a 1-minute cron; HTML scrapers on 10 min; headless (Cloudflare-protected) on 15 min with a residential proxy pool. - Normalize tier — extract
(country, region, cases_confirmed, cases_suspected, deaths, ts, source_url, sha256). Run NER for ProMED-style prose feeds. - Reconcile tier — official sources (trust 3) override trackers (trust 2) override social (trust 1). Conflicts open a manual review ticket.
- Publish tier — write the merged snapshot to
/data/outbreak.json; ProbabilityChart, ticker and StatsBar all read from it. - Resolution tier — when a market closes, fetch the cited source URL, store full HTML + SHA-256, sign with oracle key, post on-chain.