More than half a million representative text-based images compiled by a number of major European libraries.
Dataset of ICDAR 2019 Competition on Post-OCR Text Correction
The corpus accounts for 22M OCRed characters along with the corresponding Gold Standard (GS).