Apostolos Antonacopoulos’ (University of Salford) session presented and analysed the effects of scanning parameters on OCR quality, as well as the issues regarding storage and maintenance costs for Content Holders. Different experiments were carried out in order to establish scanning effects on OCR quality, including colour vs greyscale vs bitonal, effects on resolution and the comparison with images from the National Library of New Zealand (NLNZ).
The images selected for the project were taken from the British Library newspaper collection and varied in quality. To ensure optimal results, only text regions were selected, thus ignoring additional artefacts (e.g. warping). The IMPACT tool Aletheia was used to extract and key the text to be represented and ABBY Fine Reader 9 Engine software was used for the OCR process.
Overall, word accuracy improvements were more apparent when using colour, bitonal and 4 and 8-bit scanners while dithered scanners produced the lowest results, with 1.64% word accuracy.
In conclusion, Mr Antonacopoulos stressed the importance of investing in high quality images as they leave room for improvement and can be reused without the need to re-scan. However, different decisions should be taken for different document types.
View presentation here:
[slideshare id=9856515&doc=antonacopoulos-111024073933-phpapp02]
and the video here:
http://www.vimeo.com/31994394