Suspected cases | Confirmed cases | Deaths | |||||
|---|---|---|---|---|---|---|---|
Date | Suspect. (ours) | Suspect. (INRB-UMIE) | Confirmed (ours) | Confirmed (INRB-UMIE) | Deaths (ours) | Deaths susp. (INRB-UMIE) | Deaths conf. (INRB-UMIE) |
14 May | — | 246 | — | 8 | — | — | 4 |
17 May | — | 246 | — | 13 | — | — | 4 |
18 May | 516 | 516 | 33 | 33 | 131 | 131 | 4 |
19 May | 575 | 575 | 51 | 51 | 148 | 148 | 4 |
20 May | 671 | 671 | 61 | 64 | 160 | 160 | 6 |
21 May | 746 | 746 | 83 | 83 | 176 | 176 | 9 |
22 May | 867 | 867 | 91 | 91 | 204 | — | 10 |
23 May | 904 | 904 | 101 | 101 | 119 | 220 | 10 |
24 May | 906 | 906 | 105 | 105 | 223 | 223 | 10 |
25 May | 998 | 998 | 106 | 106 | 238 | 238 | 12 |
26 May | 1,077 | 1,077 | 121 | 121 | 246 | 246 | 17 |
27 May | 906 | 906 | 125 | 125 | 223 | 223 | 17 |
28 May | 349 | 349 | 210 | 210 | 17 | 236 | 17 |
29 May | — | 349 | 263 | 263 | 42 | 236 | 42 |
30 May | — | 321 | 282 | 238 | 42 | 236 | 42 |
31 May | — | 220 | 321 | 321 | 48 | — | 48 |
01 Jun | — | 289 | 344 | 344 | 60 | 242 | 60 |
02 Jun | — | 206 | 363 | 363 | 62 | — | 62 |
03 Jun | — | — | 381 | 381 | 64 | — | 64 |
04 Jun | — | 153 | 452 | 452 | 82 | — | 82 |
05 Jun | — | 119 | 488 | 488 | 86 | — | 86 |
06 Jun | — | 117 | 515 | 515 | 91 | — | 91 |
07 Jun | — | 94 | 550 | 550 | 101 | — | 101 |
08 Jun | — | — | 598 | — | 115 | — | — |
Maladie à Virus Ebola — Situation Report Extraction
Democratic Republic of Congo
Automated extraction — verification required. Case counts and figures in this report are extracted from INSP DRC PDF situation reports by an AI vision model (Anthropic Claude claude-sonnet-4-6) without systematic manual verification of every value. Extraction errors are possible, particularly where PDF table layouts are complex or inconsistent. All values should be verified against the original INSP DRC situation reports before any operational, clinical, or policy use. This report is intended for research and situational awareness only.
Introduction
This report documents automated extraction of epidemiological data from 24 situation reports (SitReps) published by the Institut National de Santé Publique (INSP DRC) during the 2026 Ebola (MVE) outbreak in the Democratic Republic of Congo, covering the period through 05 June 2026. The focus is on data completeness, extraction fidelity, and cross-source consistency rather than epidemiological interpretation.
Methods
Data sources
Case counts were taken from the INSP DRC situation report PDFs, downloaded automatically from https://insp.cd/ebola/. Where our PDF copies and the INRB-UMIE copies overlap, file checksums confirm they are identical, so any differences between the two datasets come from how the tables were read rather than which documents were used. The INRB-UMIE series, produced by manual data entry, was used as an independent check on our automated extraction.
Automated PDF extraction
Each PDF is read by Claude (Anthropic, claude-sonnet-4-6) using visual understanding of the page layout, allowing it to handle the complex table structures accurately. Four datasets are extracted from each report: daily new case counts, cumulative case and death totals, patient movement through isolation units (available from SitRep 006 onwards), and Point of Entry screening activity. A full text transcript is also saved for audit purposes.
Data harmonisation
Health zone names were standardised to a consistent spelling across all reports. Suspected and probable case counts were summed into a single “unconfirmed” category, since the column labels vary across reports. Cumulative totals were not permitted to decrease over time, and where more than one row existed for the same zone and date the higher value was retained.
Statistical analysis
All analysis and figures were produced in R (version 4.5.1) using the tidyverse [@tidyverse], ggplot2 [@ggplot2], and flextable [@flextable] packages. This report was rendered automatically with Quarto.
Cross-source validation
We checked our automated extraction against the INRB-UMIE Ebola DRC 2026 manually coded dataset, which extends through SitRep 22 (6 June 2026) in the current build. For SitReps 1–11 both datasets use the same PDF files, so differences reflect extraction method rather than source documents. From SitRep 12 onwards INRB-UMIE note that health-zone data is occasionally sent directly from INSP to INRB and may not feature in the published PDFs; this explains divergence at zone level from SitRep 12 onwards. Two zone name variants (Mongbwalu/Mongbalu and Nyankunde/Nyakunde) were matched manually; values marked “ND” were treated as missing.
Results
The following sections compare our automated extraction against the INRB-UMIE manually coded dataset and document known data quality issues in the source PDFs.
National-level comparison — this repo and INRB-UMIE
- Cases agree across sources. Cumulative suspected and confirmed case counts match exactly for every date in the overlap period; divergences are confined to the deaths series.
- Banner vs sub-table discrepancy (SitRep 010 onwards). INSP’s front-page banner and the per-zone sub-tables frequently disagree on national totals. This repo and INRB-UMIE (
national_*files) both read from the banner. The disagreement reflects a structural feature of the PDFs. - SitRep 012 (26 May) — two PDF versions. The original extraction records 238 deaths; the revised version (SitRep_012_v2) records 246 deaths. No per-zone case table was present in either version to cross-check against the banner.
- INRB-UMIE national coverage. The current build (
build-2026-06-07) includes national banner figures from SitRep 004 (14 May) through SitRep 022 (6 June 2026), and is consistent with this repo across all overlapping dates.
Data quality notes — case counts
- Mongbwalu zone-level data starts late. SitRep 001 records 246 suspected cases for all of Ituri province combined; no Mongbwalu-specific count appears until SitRep 004 (18 May, 302 suspected). The Mongbwalu line therefore starts later than in INRB-UMIE — this is a reporting gap, not a case-onset gap.
- Mongbwalu confirmed-case revision (22 May). Both datasets show cumulative confirmed cases falling from 14 (21 May) to 10 (22 May). Our series holds the value at 14 (counts cannot decrease); INRB-UMIE shows the raw figure. This is likely a downward revision in the source PDF.
- SitRep 014 (28 May) — no zone breakdown. Only a national aggregate was published during an official case-count revision. The dotted vertical line in zone panels marks this gap.
- SitRep 016 (30 May) — province-level only. Zone-level breakdowns return in SitRep 015 (29 May) but are absent again in SitRep 016, which reports only Ituri / Nord-Kivu / Sud-Kivu provincial aggregates.
- SitRep 009 (23 May) — deaths dip. Cumulative suspected deaths fell from 160 (20 May) to 119 before recovering to 223 in SitRep 010. This is reproduced faithfully from the PDF and likely reflects an interim partial recount.
Suspect and probable case reporting discontinued. From SitRep 015 (29 May) onwards, INSP stopped publishing cumulative suspect/probable breakdowns in the zone tables. From SitRep 016 (30 May) onwards they are absent from both cumulative and daily new-cases tables at all geographic levels. SitRep 014 stated explicitly that suspected deaths were temporarily excluded pending investigation results (“Les décès suspects ont été temporairement exclus du comptage dans l’attente des résultats des investigations en cours”). The orange-shaded region in the national “Suspected deaths” panel marks this ongoing series break.
INRB-UMIE suspected cases after SitRep 016 are not cumulative. From SitRep 016 onwards, INSP’s banner split suspect reporting into ‘Cas suspects en cours d’investigation’ (under investigation) and ‘Cas suspects en isolement’ (in isolation). INRB-UMIE sum these two active categories as their national suspected case figure. This means the INRB-UMIE suspected case series from SitRep 016 onwards reflects active suspects at a point in time, not cumulative ever-suspected — explaining the declining trend.
Daily new cases (national)
Cumulative counts by zone and source
Table 2 summarises numerical differences at the most recent date available in each source per zone.
Health zone | Confirmed (ours) | Confirmed (INRB-UMIE) | Δ confirmed | Suspect/prob. (ours) | Suspect/prob. (INRB-UMIE) | Δ suspect/prob. | Deaths (ours) | Deaths (INRB-UMIE) | Δ deaths |
|---|---|---|---|---|---|---|---|---|---|
Bambu | 5 | 5 | 0 | 2 | 2 | 0 | |||
Beni | 9 | 9 | 0 | 7 | 7 | 0 | |||
Bunia | 163 | 152 | 11 | 55 | 15 | 40 | |||
Butembo | 6 | 4 | 2 | 4 | 2 | 2 | |||
Katwa | 12 | 11 | 1 | 8 | 8 | 0 | |||
Mongbwalu | 114 | 98 | 16 | 88 | 29 | 59 | |||
Nyankunde | 32 | 26 | 6 | 15 | 1 | 14 | |||
Rwampara | 122 | 111 | 11 | 75 | 20 | 55 |
Response indicators
Data quality notes — response indicators
- Rwampara 21 May (SitRep 007). Data are split across three unlabelled sub-columns; both datasets handle these differently. Values for that date should be treated with caution.
- 20–21 May (SitReps 005–006). An earlier table layout made in-bed count and new admissions columns ambiguous; minor differences between datasets on those dates result from this. Agreement is close from 22 May onwards when the format was standardised.
- Coverage. INRB-UMIE (
build-2026-06-07) extends through 6 June 2026; our extraction covers through SitRep 023 (7 June 2026). Validation panels overlap through 6 June 2026.
Discussion
Outbreak trajectory
- 488 confirmed cases have been reported nationally as of 05 June 2026.
- The epidemic is concentrated in Ituri province (Bunia, Rwampara, Mongbwalu), with a smaller number of cases in Nord-Kivu and Sud-Kivu.
- INSP discontinued cumulative suspect/probable reporting from SitRep 015 onwards, limiting assessment of total burden and complicating comparisons with earlier reports.
Limitations
- Zone-level breakdowns are absent or incomplete in SitReps 001–013 and in SitReps 014 and 016; the zone-level series is therefore not fully continuous.
- Response indicator table layouts varied across early reports, introducing minor inconsistencies in trend data.
- Automated extraction and INRB-UMIE manual coding provide complementary checks: agreement increases confidence in a figure; disagreement typically reflects genuine ambiguity in the source document.
Appendix — Internal consistency check
Zone sum vs national aggregate
Cumulative zone-row sums are compared with the national banner figure for each date. Highlighted rows (non-zero Δ) arise from three causes:
- Early reports (SitReps 001–003) provide only a province-level Ituri total, not individual zone rows.
- Some reports omit certain zones from the breakdown table.
- The banner vs sub-table discrepancy present from SitRep 010 onwards (described above).
Suspected | Confirmed | Deaths | |||||||
|---|---|---|---|---|---|---|---|---|---|
Date | Zones (susp.) | Nat. (susp.) | Δ susp. | Zones (conf.) | Nat. (conf.) | Δ conf. | Zones (deaths) | Nat. (deaths) | Δ deaths |
18 May | 515 | 516 | -1 | 32 | 33 | -1 | 131 | 131 | 0 |
19 May | 574 | 575 | -1 | 50 | 51 | -1 | 148 | 148 | 0 |
20 May | 671 | 671 | 0 | 56 | 61 | -5 | 160 | 160 | 0 |
21 May | 741 | 746 | -5 | 75 | 83 | -8 | 175 | 176 | -1 |
22 May | 852 | 867 | -15 | 79 | 91 | -12 | 201 | 204 | -3 |
23 May | 887 | 904 | -17 | 89 | 101 | -12 | 215 | 119 | 96 |
24 May | 887 | 906 | -19 | 91 | 105 | -14 | 218 | 223 | -5 |
25 May | 966 | 998 | -32 | 91 | 106 | -15 | 235 | 238 | -3 |
27 May | 874 | 906 | -32 | 109 | 125 | -16 | 218 | 223 | -5 |
29 May | 224 | 263 | -39 | 39 | 42 | -3 | |||
01 Jun | 227 | 344 | -117 | 44 | 60 | -16 | |||
02 Jun | 242 | 363 | -121 | 46 | 62 | -16 | |||
03 Jun | 258 | 381 | -123 | 48 | 64 | -16 | |||
04 Jun | 320 | 452 | -132 | 65 | 82 | -17 | |||
05 Jun | 354 | 488 | -134 | 69 | 86 | -17 | |||
06 Jun | 381 | 515 | -134 | 74 | 91 | -17 | |||
07 Jun | 416 | 550 | -134 | 84 | 101 | -17 | |||
08 Jun | 463 | 598 | -135 | 97 | 115 | -18 | |||