This practical session started with the attendees introducing themselves and splitting up into 3 groups, so that each could work on a different set of tasks based on a Case Study.
Case Study:
A collection holder wants to reduce storage costs for his collections that
are currently available as TIFF master files. She/he heard that JPEG2000 is
a good candidate for storing digital master files, and she/he heard about
the efficiency of image compression when using lossy compression.
She/he knows that JPEG2000 compression can be “visually lossless”, so that
the compression is reversible, but she/he is still concerned about the
impact the JPEG2000 compression could have on OCR.
We suggest a Taverna workflow that creates an executable processing pipeline
for studying the results.
The workflow should have 1 TIFF image as input and a list of increasing
compression parameters which are used when encoding the image. The image
should then be decompressed before applying the OCR. Finally, the impact
of the compression on the OCR should be measured by comparing the original
OCR output to the OCR output of the compressed images.
The Three Groups:
Group 1
Use the toolwrapper for providing access to a JPEG2000 encoding/decoding tool:
- Toolwrapper
- Tools
Group 2
Use Taverna for creating the workflow:
- Endpoints
Group 3
Use a Taverna beanshell for creating the Text comparison
- commons-lang-2.4.jar (/home/<youruser>/.taverna-home/lib/commons-lang-2.4.jar)
The selection of groups has shown a definite preference for the more ‘user’ based tasks rather than ‘developer’ tasks, with 12 working on Group 1, 6 on Group 2 and only 3 on Group3. However, quite a few attendees seemed happy to be involved in more than one group, or work in one, but support users in another.
General feeling is that this bodes well for tomorrow which has a more ‘practical’ based timetable.