A collection of historical and named-entity lexica for Bulgarian, Czech, Dutch, English, French, German, Polish, Slovene, Spanish and Latin.
IMPACT Ground Truth and Image Dataset
More than half a million representative text-based images compiled by a number of major European libraries.
Natural History Museum Lepidoptera
This dataset contains contains scans of index cards from the UK’s Natural History Museum lepidoptera index
Layout Analysis Dataset
This dataset has been created primarily for the evaluation of layout analysis (physical and logical) methods.
Abraham, Belgian Newspaper Catalogue
Abraham. Belgian newspaper catalogue is the catalogue of Belgian newspapers published since 1800.
Dataset of ICDAR 2019 Competition on Post-OCR Text Correction
The corpus accounts for 22M OCRed characters along with the corresponding Gold Standard (GS).
Census 1961 Project Dataset
Images containing tables from the 1961 Census for England and Wales.
RDCL2017
Example and evaluation dataset used for the ICDAR2017 Competition on Recognition of Documents with Complex Layouts
RASM2018
Example and evaluation dataset used for the ICFHR2018 Competition on Recognition of Historical Arabic Scientific Manuscripts.
Europeana Newspapers
This online repository is the main point of reference for all activities related to evaluation within the scope of the Europeana Newspapers project.
- Page 1 of 2
- 1
- 2