Maladie à Virus Ebola — Situation Report Extraction

Democratic Republic of Congo

Author

Billy J Quilty (Charité Berlin, LSHTM & MSF Epicentre)

Published

10 June 2026

Warning

Automated extraction — verification required. Case counts and figures in this report are extracted from INSP DRC PDF situation reports by an AI vision model (Anthropic Claude claude-sonnet-4-6) without systematic manual verification of every value. Extraction errors are possible, particularly where PDF table layouts are complex or inconsistent. All values should be verified against the original INSP DRC situation reports before any operational, clinical, or policy use. This report is intended for research and situational awareness only.

Introduction

This report documents automated extraction of epidemiological data from 24 situation reports (SitReps) published by the Institut National de Santé Publique (INSP DRC) during the 2026 Ebola (MVE) outbreak in the Democratic Republic of Congo, covering the period through 05 June 2026. The focus is on data completeness, extraction fidelity, and cross-source consistency rather than epidemiological interpretation.

Methods

Data sources

Case counts were taken from the INSP DRC situation report PDFs, downloaded automatically from https://insp.cd/ebola/. Where our PDF copies and the INRB-UMIE copies overlap, file checksums confirm they are identical, so any differences between the two datasets come from how the tables were read rather than which documents were used. The INRB-UMIE series, produced by manual data entry, was used as an independent check on our automated extraction.

Automated PDF extraction

Each PDF is read by Claude (Anthropic, claude-sonnet-4-6) using visual understanding of the page layout, allowing it to handle the complex table structures accurately. Four datasets are extracted from each report: daily new case counts, cumulative case and death totals, patient movement through isolation units (available from SitRep 006 onwards), and Point of Entry screening activity. A full text transcript is also saved for audit purposes.

Data harmonisation

Health zone names were standardised to a consistent spelling across all reports. Suspected and probable case counts were summed into a single “unconfirmed” category, since the column labels vary across reports. Cumulative totals were not permitted to decrease over time, and where more than one row existed for the same zone and date the higher value was retained.

Statistical analysis

All analysis and figures were produced in R (version 4.5.1) using the tidyverse [@tidyverse], ggplot2 [@ggplot2], and flextable [@flextable] packages. This report was rendered automatically with Quarto.

Cross-source validation

We checked our automated extraction against the INRB-UMIE Ebola DRC 2026 manually coded dataset, which extends through SitRep 22 (6 June 2026) in the current build. For SitReps 1–11 both datasets use the same PDF files, so differences reflect extraction method rather than source documents. From SitRep 12 onwards INRB-UMIE note that health-zone data is occasionally sent directly from INSP to INRB and may not feature in the published PDFs; this explains divergence at zone level from SitRep 12 onwards. Two zone name variants (Mongbwalu/Mongbalu and Nyankunde/Nyakunde) were matched manually; values marked “ND” were treated as missing.

Results

The following sections compare our automated extraction against the INRB-UMIE manually coded dataset and document known data quality issues in the source PDFs.

National-level comparison — this repo and INRB-UMIE

  • Cases agree across sources. Cumulative suspected and confirmed case counts match exactly for every date in the overlap period; divergences are confined to the deaths series.
  • Banner vs sub-table discrepancy (SitRep 010 onwards). INSP’s front-page banner and the per-zone sub-tables frequently disagree on national totals. This repo and INRB-UMIE (national_* files) both read from the banner. The disagreement reflects a structural feature of the PDFs.
  • SitRep 012 (26 May) — two PDF versions. The original extraction records 238 deaths; the revised version (SitRep_012_v2) records 246 deaths. No per-zone case table was present in either version to cross-check against the banner.
  • INRB-UMIE national coverage. The current build (build-2026-06-07) includes national banner figures from SitRep 004 (14 May) through SitRep 022 (6 June 2026), and is consistent with this repo across all overlapping dates.
Table 1: National cumulative totals — cross-source comparison (this repo vs INRB-UMIE). Shading indicates values that differ. INRB-UMIE: INRB-UMIE/Ebola DRC 2026. A dash indicates no data available from that source for that date.

Suspected cases

Confirmed cases

Deaths

Date

Suspect. (ours)

Suspect. (INRB-UMIE)

Confirmed (ours)

Confirmed (INRB-UMIE)

Deaths (ours)

Deaths susp. (INRB-UMIE)

Deaths conf. (INRB-UMIE)

14 May

246

8

4

17 May

246

13

4

18 May

516

516

33

33

131

131

4

19 May

575

575

51

51

148

148

4

20 May

671

671

61

64

160

160

6

21 May

746

746

83

83

176

176

9

22 May

867

867

91

91

204

10

23 May

904

904

101

101

119

220

10

24 May

906

906

105

105

223

223

10

25 May

998

998

106

106

238

238

12

26 May

1,077

1,077

121

121

246

246

17

27 May

906

906

125

125

223

223

17

28 May

349

349

210

210

17

236

17

29 May

349

263

263

42

236

42

30 May

321

282

238

42

236

42

31 May

220

321

321

48

48

01 Jun

289

344

344

60

242

60

02 Jun

206

363

363

62

62

03 Jun

381

381

64

64

04 Jun

153

452

452

82

82

05 Jun

119

488

488

86

86

06 Jun

117

515

515

91

91

07 Jun

94

550

550

101

101

08 Jun

598

115

Data quality notes — case counts

  • Mongbwalu zone-level data starts late. SitRep 001 records 246 suspected cases for all of Ituri province combined; no Mongbwalu-specific count appears until SitRep 004 (18 May, 302 suspected). The Mongbwalu line therefore starts later than in INRB-UMIE — this is a reporting gap, not a case-onset gap.
  • Mongbwalu confirmed-case revision (22 May). Both datasets show cumulative confirmed cases falling from 14 (21 May) to 10 (22 May). Our series holds the value at 14 (counts cannot decrease); INRB-UMIE shows the raw figure. This is likely a downward revision in the source PDF.
  • SitRep 014 (28 May) — no zone breakdown. Only a national aggregate was published during an official case-count revision. The dotted vertical line in zone panels marks this gap.
  • SitRep 016 (30 May) — province-level only. Zone-level breakdowns return in SitRep 015 (29 May) but are absent again in SitRep 016, which reports only Ituri / Nord-Kivu / Sud-Kivu provincial aggregates.
  • SitRep 009 (23 May) — deaths dip. Cumulative suspected deaths fell from 160 (20 May) to 119 before recovering to 223 in SitRep 010. This is reproduced faithfully from the PDF and likely reflects an interim partial recount.

Suspect and probable case reporting discontinued. From SitRep 015 (29 May) onwards, INSP stopped publishing cumulative suspect/probable breakdowns in the zone tables. From SitRep 016 (30 May) onwards they are absent from both cumulative and daily new-cases tables at all geographic levels. SitRep 014 stated explicitly that suspected deaths were temporarily excluded pending investigation results (“Les décès suspects ont été temporairement exclus du comptage dans l’attente des résultats des investigations en cours”). The orange-shaded region in the national “Suspected deaths” panel marks this ongoing series break.

INRB-UMIE suspected cases after SitRep 016 are not cumulative. From SitRep 016 onwards, INSP’s banner split suspect reporting into ‘Cas suspects en cours d’investigation’ (under investigation) and ‘Cas suspects en isolement’ (in isolation). INRB-UMIE sum these two active categories as their national suspected case figure. This means the INRB-UMIE suspected case series from SitRep 016 onwards reflects active suspects at a point in time, not cumulative ever-suspected — explaining the declining trend.

Daily new cases (national)

Figure 1: Daily new cases reported nationally by case classification (from ‘Nouveaux’ rows in each SitRep). Suspect/probable cases absent from SitRep 016 onwards.

Cumulative counts by zone and source

Figure 2: Cross-source comparison of cumulative confirmed cases, suspect/probable cases, and outbreak deaths by health zone. Solid lines: automated extraction (raw); dashed lines: INRB-UMIE (manual extraction). For SitReps 1–11 both sources use the same PDFs; from SitRep 12 onwards INRB-UMIE zone data may include figures sent directly from INSP to INRB that do not appear in the published PDFs. From SitRep 16 onwards INRB-UMIE national suspected cases reflect active suspects (under investigation + in isolation), not cumulative totals.

Table 2 summarises numerical differences at the most recent date available in each source per zone.

Table 2: Cumulative case counts at the latest available date per zone: automated extraction vs INRB-UMIE. Δ = ours minus INRB-UMIE. Dashes indicate no data in that source. Note: INRB-UMIE zone data from SitRep 12 onwards diverges from the published PDFs (exact supplementary source not documented in their public repository).

Health zone

Confirmed (ours)

Confirmed (INRB-UMIE)

Δ confirmed

Suspect/prob. (ours)

Suspect/prob. (INRB-UMIE)

Δ suspect/prob.

Deaths (ours)

Deaths (INRB-UMIE)

Δ deaths

Bambu

5

5

0

2

2

0

Beni

9

9

0

7

7

0

Bunia

163

152

11

55

15

40

Butembo

6

4

2

4

2

2

Katwa

12

11

1

8

8

0

Mongbwalu

114

98

16

88

29

59

Nyankunde

32

26

6

15

1

14

Rwampara

122

111

11

75

20

55

Response indicators

Data quality notes — response indicators

  • Rwampara 21 May (SitRep 007). Data are split across three unlabelled sub-columns; both datasets handle these differently. Values for that date should be treated with caution.
  • 20–21 May (SitReps 005–006). An earlier table layout made in-bed count and new admissions columns ambiguous; minor differences between datasets on those dates result from this. Agreement is close from 22 May onwards when the format was standardised.
  • Coverage. INRB-UMIE (build-2026-06-07) extends through 6 June 2026; our extraction covers through SitRep 023 (7 June 2026). Validation panels overlap through 6 June 2026.
Figure 3: Cross-source comparison of zone-level response indicators. Solid lines: automated extraction; dashed lines: INRB-UMIE. Both series treat the ambiguous 21 May Rwampara sub-columns differently; values for that date should be interpreted cautiously. The two datasets differ slightly on 20–21 May due to an ambiguous table layout in SitReps 005–006, and are in close agreement from 22 May onwards.
Figure 4: Cross-source comparison of daily Point of Entry (POE) screening totals. Automated extraction reports a single aggregate per sitrep date; INRB-UMIE data are summed across individual checkpoints to produce a daily total.
Figure 5: Cross-source comparison of cumulative contacts traced by health zone. Solid lines: automated extraction (raw); dashed lines: INRB-UMIE (manual extraction). ND values excluded.

Discussion

Outbreak trajectory

  • 488 confirmed cases have been reported nationally as of 05 June 2026.
  • The epidemic is concentrated in Ituri province (Bunia, Rwampara, Mongbwalu), with a smaller number of cases in Nord-Kivu and Sud-Kivu.
  • INSP discontinued cumulative suspect/probable reporting from SitRep 015 onwards, limiting assessment of total burden and complicating comparisons with earlier reports.

Limitations

  • Zone-level breakdowns are absent or incomplete in SitReps 001–013 and in SitReps 014 and 016; the zone-level series is therefore not fully continuous.
  • Response indicator table layouts varied across early reports, introducing minor inconsistencies in trend data.
  • Automated extraction and INRB-UMIE manual coding provide complementary checks: agreement increases confidence in a figure; disagreement typically reflects genuine ambiguity in the source document.

Appendix — Internal consistency check

Zone sum vs national aggregate

Cumulative zone-row sums are compared with the national banner figure for each date. Highlighted rows (non-zero Δ) arise from three causes:

  • Early reports (SitReps 001–003) provide only a province-level Ituri total, not individual zone rows.
  • Some reports omit certain zones from the breakdown table.
  • The banner vs sub-table discrepancy present from SitRep 010 onwards (described above).
Table 3: Zone sum vs national aggregate. Values show the cumulative total across all extracted health zones versus the national figure from the front-page banner. Δ = zone sum minus national banner. Highlighted rows have a non-zero difference in at least one metric.

Suspected

Confirmed

Deaths

Date

Zones (susp.)

Nat. (susp.)

Δ susp.

Zones (conf.)

Nat. (conf.)

Δ conf.

Zones (deaths)

Nat. (deaths)

Δ deaths

18 May

515

516

-1

32

33

-1

131

131

0

19 May

574

575

-1

50

51

-1

148

148

0

20 May

671

671

0

56

61

-5

160

160

0

21 May

741

746

-5

75

83

-8

175

176

-1

22 May

852

867

-15

79

91

-12

201

204

-3

23 May

887

904

-17

89

101

-12

215

119

96

24 May

887

906

-19

91

105

-14

218

223

-5

25 May

966

998

-32

91

106

-15

235

238

-3

27 May

874

906

-32

109

125

-16

218

223

-5

29 May

224

263

-39

39

42

-3

01 Jun

227

344

-117

44

60

-16

02 Jun

242

363

-121

46

62

-16

03 Jun

258

381

-123

48

64

-16

04 Jun

320

452

-132

65

82

-17

05 Jun

354

488

-134

69

86

-17

06 Jun

381

515

-134

74

91

-17

07 Jun

416

550

-134

84

101

-17

08 Jun

463

598

-135

97

115

-18