Developing a unique user tool with a team led by Prof. Schulz, Ulrich Reffle along with Annette Gotscharek, Christoph Ringlstetter and Thorsten Vobl. at the University of Munich have potentially revolutionised the speed at which researchers can analyse texts.
Ulrich pointed out some of the major problems facing scholars with the non-standardisation of spelling variants in historical texts. He also cites the problems with specialised words, including technical term, names of people and name places alongside antiquated words that are not included in the lexicon. And this is on top of regular OCR errors in recognising hazardous printing methods.
Ulrich and the team have come up with developing Error Profiles for individual texts. These recognise a particular set of characteristics within an individual text and can create adaptive solutions for that set of problems. These could be different spelling of vowel and diphthong sounds or regular swapping of particular letters. These rules can then be corrected automatically saving the time and patience of the scholar.
For uncertain words the profile tool will flag up suggestions, Ulrich gave the example of the old english term of “hath” and corresponding modern equivalents “has”, “hat” etc.
The second bulkhead of the work done by Ulrich and his team has been in the area of a post-correction system. For this they created from scratch an interface that gives users novel possibilities to detection, presentation and correction of OCR mistakes. This allows the user to see on one screen, the image of the original page, alongside the OCR editor tool with a special functionality window. With the help of historical lexica this functionality window can provide suggestions for corrections alongside the word itself, with a text tool that allows a drop down menu much like a spell check.
The team evaluated their tool on 14 participants and found that when working with the Post-Correction Tool, researchers completed tasks 2.7 times faster than without it.
The interface technology is open source and available to all, although the error profiles are protected by US patent there is a web-service free of charge. Contact Ulrich at for more details on these remarkable advances in efficiency.
View the presentation here:
[slideshare id=9869531&doc=ulrichreffle-111025034700-phpapp01]
and the video here:
http://www.vimeo.com/32504495