The final evaluation of the Succeed project took place yesterday, 19th of February, at the University of Alicante during a meeting of the committee of experts appointed by the European Commission (EC) with the Succeed consortium members, formed by the University of Alicante, the Virtual Library Miguel de Cervantes Foundation, the Koninklijke Bibliotheek, the Instituut voor Nederlandse Lexicologie, the Fraunhofer IAIS, the Poznan Supercomputing and Networking Center, the University of Salford, the Bibliothèque Nationale de France and the British Library. The meeting was chaired by Cristina Maier, Succeed Project Officer from the European Commission.
Succeed has been funded by the European Union to promote the take up and validation of research results in mass digitisation, with a focus on textual content. During the period 2013-2014, Succeed has organized seven conferences, it has managed the integration of many tools in 13 European libraries and it has created reports, analysis, and recommendations to improve efficiency in the production of digital content, among other achievements.
Summary of project results
During the project life span (January 2013-December 2014), Succeed has produced the following outputs:
- Five improved releases of the Impact Interoperable Platform which integrated 20 new software tools.The platform (Figure 1) currently supports the online testing of 40 tools with simpler input/output operations and the users do not need to install software in their computers (a task which often leads to difficulties, for example, owing to incompatibilities or dependencies with other software packages installed).
The possibility to execute workflows has been added to the demonstration platform so that the user can define multistage processes (even processes involving input from multiple sub-processes) when several tools operate on the input.
The latest releases provide enhanced usability: for example, simple drag-and-drop operations are possible and the user can simply select the input file or folder in a very intuitive manner.
The interoperable platform has been deployed in and integrated into the Impact Centre of Competence website (www.digitisation.eu) for a more effective dissemination and reinforced sustainability.
The current version of the platform provides access to the open data-sets created by the Impact project –250,000 high-resolution images and 30,000 ground truth files– with the recent addition of a specific viewer which allows the complete collection to be navigated.
- A survey with nearly 260 useful tools and resources for text digitisation has been compiled and published online. This online catalogue (integrated in the Impact Centre website, as shown in Figure 2 supports the filtering of tools and resources by their type and target language.
- Over 20 tools in the above catalogue have been selected for their validation –with the active assistance of Succeed– in the productive environments of 13 libraries:
- Wielkopolska Biblioteka Cyfrowa (Poland)
- Biblioteca Histórica General de Salamanca (Spain)
- Wroclaw University Library (Poland)
- University Library of Bratislava (Slovak Republic)
- National Library of Finland (Finland)
- Biblioteca de la Universidad de Granada (Spain)
- University Library of Leuven (Belgium)
- University Library of Antwerp (Belgium)
- University Library of Darmstadt (Germany)
- Biblioteca Virtual Miguel de Cervantes (Spain)
- British Library (United Kingdom)
- Bibliothèque nationale de France (France)
- Koninklijke Bibliotheek (Netherlands)
After a selection process based on technical requirements –such as the quality of the documentation and the level of support–, a total of 36 validation experiences took place. The tools included in this take-up programme (with some overlapping of tools between institutions) supported a variety of workflows:
- image processing and enhancement (11 cases);
- image segmentation and layout analysis (2 cases);
- text recognition (4 cases);
- text processing operations, such as corpus creation, named entity recognition, semantic analysis, and output post-processing (14 cases);
- evaluation of the output; validation and management of descriptive metadata (5 cases).
By the end of the validation period, 11 tools have been successfully integrated into the productive environments of libraries in 14 cases. In 4 additional cases, some further development or adaptation was considered necessary for a full integration of the tool. The libraries reported that all these tools either improved the quality of the output or made the production of digital content more cost-effective. In the remaining 18 cases, the take-up was not finalised because the tool was considered inappropriate for the particular needs of the library.
This practical experience showed that the integration of tools is more effective when a library is either working towards extending an existing workflow or it has a roadmap towards establishing a workflow. It also proved that tools are often selected with unrealistic expectations, a fact that calls for improved documentation of the resources which should be complemented with some specific training of the staff involved.
- Two sets of recommendations for digitisation projects, the first one for metadata and data formats used in text digitisation and a second one for common licensing schemes for tools and resources.The first report builds on existing recommendations and guidelines for metadata and data formats –such as those from Impact project, JISC Digital, NISO and University of Virginia Library–, and also on ongoing initiatives (a questionnaire was disseminated among 86 cultural heritage institutions for this purpose). The analysis covers long-term preservation, online delivery, as well as advanced and supporting technologies. The results are summarised below.
Table 1: Recommendations for long-term preservation. Application Recommended format Alternative Master for still images TIFF JPEG2000 (JP2) Master for textual documents TEI, PDF/A UTF-8 plain text Descriptive metadata DCMES, MODS MARC21 Structural metadata METS N/A Administrative metadata PREMIS, MIX, TextMD N/A OCR output ALTO, PAGE UTF-8 plain text Table 2: Recommendations for online delivery and for advanced and supporting technologies. Application Recommended format Alternative File delivery [l]JPEG, PDF, JPEG2000 (JP2),
ePUB, MOBI derived from ePUBN/A Descriptive metadata DCMES, EDM N/A Object identifier OAI Identifier, DOI N/A Linked Open Data RDFa, SPARQL N/A Linguistic resources TEI, CMDI, LMF N/A Tools packaging [l]At least MS Windows
and Linux packagesN/A The second report compiles existing licensing schemes and analyses their current usage by consulting a number of organisations, the inventory of tools for digitisation created by Succeed –containing more than 200 items–, and also popular software repositories such as SourceForge and GitHub.
The survey also permitted additional insights to be gained since it allowed the experiences of 37 organisations in this domain to be gathered. They included commercial companies, research centres, data centres, libraries, archives and museums along with institutions dealing with sound and vision. The main conclusions are summarised in the following table:
Table 3: Main licensing schemes for digitisation tools and resources. Object type Recommended license Content Creative Common framework [l]Data &
Metadata[l] Open Data Commons Attribution License (ODC-BY)
Creative Commons (version 4.0) BYSoftware [l] GNU Public License (GPL) v3.0 (preferred)
Lesser General Public License (LGPL) v3.0 (sw. integration)
Apache License v2.0 (for wider uptake) - Two contests and two editions of the Succeed awards have been organised and presented.The two contests took place in the context of the International Conference on Document analysis and Recognition (ICDAR) with the objective of evaluating state-of-the-art technology addressing particular stages of the digitisation workflow like segmentation and recognition. For this task, adequate sample and evaluation data sets were prepared, as well as validation criteria. The data and the associated knowledge will remain as useful benchmarks for the observation of the progress in this area. Furthermore, many data owners do not grant unlimited access to their data and, therefore, the gateway allows for the dissemination of data sets under a wider spectrum of licensing conditions.
The first Succeed awards distinguished the best initiatives in the application of advanced technology for the digitisation of historical texts. After the deadline for nominations (February 15, 2014), the 18 submissions received were judged by Succeed’s Expert Advisory Board: Milagros del Corral (former BNE director), Jill Cousins (Europeana), Frank Frischmuth (German Digital Library), Michael Keller (Stanford University Library), Steven Krauwer (Utrecht University), and Andrew Prescott (King’s College London). The institutions which were presented with awards were:
- Hill Museum and Manuscript Library.
- Centre d’Études Supérieures de la Renaissance.
- Tecnilógica (Commendation of Merit).
- London Metropolitan Archives – University College London (Commendation of Merit).
The second edition recognised the best performance in the validation and take-up of digitisation tools promoted by Succeed. Among the 9 external participants, the following libraries have been presented with awards:
- Library of KU Leuven.
- National Library of Finland (Commendation of Merit).
- University Library of Wroclaw (Commendation of Merit).
- The organisation of several events aiming to disseminate the latest technology for the digitisation of text:
- The Succeed Developers’ Workshops promoted the involvement of the community of programmers and developers in the further development of tools and services for digitisation. The First Developers’ Workshop took place at the Koninklijke Bibliotheek (The Hague), on September 19-20, 2013. The Second Developers’ Workshop took place at the Universidad de Alicante, on April 10-11, 2014.
- The Succeed tutorial State-of-the-art Tools for Text Digitisation was held in Valleta (Malta), September 22, 2013, in the framework of the TPDL 2013 conference. The tutorial aimed to present the latest technology for image enhancement, OCR, post-correction, logical structure analysis, lexicon building, deployment and enrichment. It followed a hands-on approach< and gathered attendees from 15 cultural-heritage institutions.
- The Succeed Workshop on interoperability of digitisation platforms took place at the Koninklijke Bibliotheek in The Hague (October 2, 2014). Nineteen researchers, librarians, and computer scientists from several European countries participated in the workshop.
- The Digitisation Days created a first-order event reaching a wide audience which brought together 182 participants from complementary communities: researchers, companies and practitioners. The event took place last May 19-20, 2014 in Madrid and consisted of:
- An exhibition where service providers (i2s, Contentra technologies, libnova, tecnilógica, DIGIBIS) showcased the latest tools for digitisation.
- The first DATeCH conference (Digital Access to Textual Cultural Heritage) where the challenges for digitisation of historical documents, the latest trends in research and the requirements and experiences by practitioners were shared. Among the 50 contributions submitted, 31 were selected for presentation by a programme committee integrating prestigious experts in this area. All accepted communications have been published by ACM (http://dl.acm.org/citation.cfm?id=2595188) and made a significant impact (Cumulative downloads till 15 January 2015: 955).
- The ceremony of the First Succeed Awards recognising the best initiatives in the application of the most advanced technology for the digitisation of historical texts.
- The principal results of the project were presented at the final Succeed conference (November 28, 2014). The event, entitled Succeed in digitisation. Spreading excellence took place in the Bibliothèque national de France, and brought together near 70 experts in the field of digitisation and cultural heritage. The Succeed awards (second edition) were presented during the conference.
- A map of the European digitisation landscape, compiling over 380 active projects, centres of competence, companies, public institutions and networks. An online search platform, developed by Succeed, allows for a highly usable access to the information therein. A registration gateway allows for personalised access and for the update of the information by the community.
- The maintenance of an online service supporting the preparation of new proposals in the digitisation domain, the compilation of information related to funding opportunities, and the publication of practical guides.
- A blog for the coordination of European Centres of Competence in Digitisation and the report Best practices and management procedures for centres of competence, which provides a preliminary description of the strengths and weaknesses of the existing Centres of Competence and describes three alternative scenarios for their future development.
- A Roadmap for sustainable centres of competence to support digital libraries which provides insight into the capacity to support the evolution of European digital libraries through centres of competence. After investigating the main technical, capacity-related and economic challenges, some actions are proposed to advance the state-of-the-art as regards mass digitisation of cultural heritage in Europe in three respects: capacity building, advance of technology and sustainability.