A collection of historical and named-entity lexica for Bulgarian, Czech, Dutch, English, French, German, Polish, Slovene, Spanish and Latin.
IMPACT Ground Truth and Image Dataset
More than half a million representative text-based images compiled by a number of major European libraries.
Dataset of ICDAR 2019 Competition on Post-OCR Text Correction
The corpus accounts for 22M OCRed characters along with the corresponding Gold Standard (GS).
Abraham, Belgian Newspaper Catalogue
Abraham. Belgian newspaper catalogue is the catalogue of Belgian newspapers published since 1800.