IMPACT Ground Truth and Image Dataset - IMPACT Centre of Competence

Description

More than half a million representative text-based images compiled by a number of major European libraries. Covering texts from as early as 1500, and containing material from newspapers, books, pamphlets and typewritten notes, the dataset is an invaluable resource for future research into imaging technology, OCR and language enrichment. Ca. 50,000 GT files

Dataset content type

Groundtruth
Images
Metadata

Dataset scope

Layout analysis
Postcorrection
OCR

Language

Bulgarian
Czech
Dutch
English
French
German
Polish
Slovene
Spanish

Size

ca. 500,000 images and ca.50,000 GT files

Dataset License

CC - Attribution NonCommercial NoDerivatives or equivalent
CC - Attribution NonCommercial ShareAlike or equivalent
CC - Attribution ShareAlike or equivalent
Public domain

Dataset owner

British Library, Bibliothèque nationale de France, Biblioteca Nacional de España, National Library of the Netherlands, Biblioteca virtual Miguel de Cervantes, Poznan Supercomputing and Networking Centre, National Library of Slovenia, Bavarian State Library, National Library of Bulgaria, National Library of Czech Republic

Dataset distributor

IMPACT Centre of Competence, PRImA

Link

https://www.digitisation.eu/tools-resources/image-and-ground-truth-resources/

Contact

tech.support@digitisation.eu