More than half a million representative text-based images compiled by a number of major European libraries.
Dataset of ICDAR 2019 Competition on Post-OCR Text Correction
The corpus accounts for 22M OCRed characters along with the corresponding Gold Standard (GS).
Layout Analysis Dataset
This dataset has been created primarily for the evaluation of layout analysis (physical and logical) methods.
Natural History Museum Lepidoptera
This dataset contains contains scans of index cards from the UK’s Natural History Museum lepidoptera index
Europeana Newspapers
This online repository is the main point of reference for all activities related to evaluation within the scope of the Europeana Newspapers project.
RASM2018
Example and evaluation dataset used for the ICFHR2018 Competition on Recognition of Historical Arabic Scientific Manuscripts.
RDCL2017
Example and evaluation dataset used for the ICDAR2017 Competition on Recognition of Documents with Complex Layouts
REID2017
Example and evaluation dataset used for the ICDAR2017 Competition on Recognition of Early Indian printed Documents
HNLA2013
Example and evaluation dataset used for the ICDAR2013 Competition on Historical Newspaper Layout Analysis
HBR2013
Example and evaluation dataset used for the ICDAR2013 Competition on Historical Book Recognition.
- Page 1 of 2
- 1
- 2