On 26 June 2012 the IMPACT project organised their final event at the KB National Library of the Netherlands in The Hague. The project outcomes were presented by IMPACT staff to an audience of nearly 100 people from around Europe.
Opening of the workshop by Paul Doorenbosch – Audio file
Paul Doorenbosch, Head of the KB Research department, opened the workshop by welcoming all participants and stressing the importance of the IMPACT project, and the research done into improving OCR software and language technology, for the KB.
Keynote speech “Indexing and searching of “˜noisy”™ data” by Franciska de Jong – Audio file
Francisca de Jong (Professor of language technology, University of Twente and member of the KB Board of Governors) started her keynote by explaining the concept of noisy data and the noisy-channel coding theorem. After illustrating several ways noisy data can influence indexing and searching, she concluded her keynote by stating that, for researchers in the language and wider digital humanities field, there is no data like more data!
Summary of IMPACT project & results by Hildelies Balk (KB, IMPACT Project Director)
Hildelies Balk continued with an summary overview of the IMPACT project achievements. Content holders, researchers and industry partners have worked together for the last 4,5 years to tackle each step in the digitisation workflow. All tools have been evaluated in different scenario”™s on the IMPACT dataset and show improvement on the state of the art. Preliminary test with the Dutch historical lexicon and the ABBYY FineReader Engine 10 for example demonstrated a 15% increase of words found, which is of great benefit to the end user.
Overview of IMPACT tools by: ABBYY, NCSR Demokritos, University of Salford, IBM, University of Innsbruck, LMU University of Munich, INL Institute for Dutch Lexicology and KB
The tool providers in IMPACT each got 5 minutes to pitch the tools they developed in IMPACT, which was sometimes a challenge with so many tools! They include:
- ABBYY: FineReader Engine 10 and Recognition Server 3.0
- NCSR: Border Removal tool, Page Curl Correction tool, OCR evaluation toolkit, and the Word spotting engine
- USAL: IMPACT Repository, Aletheia, Layout Evaluation, Text line & word segmentation, Correction of arbitrary warping
- IBM: Adaptive OCR engine with CONCERT Collaborative Correction Platform
- UIBK: Functional Extension Parser
- LMU: Postcorrection tool with Text and Error Profiler
- INL: Tools for Lexicon Building, OCR with historical lexica, Retrieval demonstrator
- KB: Interoperability Framework
The IMPACT Centre of Competence by Rafael Carrasco (University of Alicante) – Audio file
The IMPACT Centre of Competence will continue after the IMPACT project ends as a unique cooperation between researchers, libraries, digitisation experts and service providers. The Centre is run by the Fundación Biblioteca Virtual Miguel de Cervantes and has nine founding premium members. After showing a walk-through of the main features of the Centre of Competence website, www.digitisation.eu, Rafael Carrasco invited everyone to become a member and profiting from the reduced fees in 2012.