[cs_content][cs_section parallax=”false” separator_top_type=”none” separator_top_height=”50px” separator_top_angle_point=”50″ separator_bottom_type=”none” separator_bottom_height=”50px” separator_bottom_angle_point=”50″ style=”margin: 0px;padding: 0px 0px 45px;”][cs_row inner_container=”true” marginless_columns=”false” style=”margin: 0px auto;padding: 0px;”][cs_column fade=”false” fade_animation=”in” fade_animation_offset=”45px” fade_duration=”750″ type=”2/3″ style=”padding: 0px;”][x_custom_headline level=”h1″ looks_like=”h2″ accent=”false”]IMPACT Polish GT Corpora[/x_custom_headline][cs_text]
Produced by: University of Warsaw
[/cs_text][/cs_column][cs_column fade=”false” fade_animation=”in” fade_animation_offset=”45px” fade_duration=”750″ type=”1/3″ style=”padding: 0px;”][x_gap size=”6em”][cs_block_grid type=”three-up”][cs_block_grid_item title=”Previous”][x_button shape=”square” size=”mini” float=”left” icon_only=”true” href=”/tools-resources/language-resources/historical-lexicon-of-latin/” info=”none” info_place=”top” info_trigger=”hover”][x_icon type=”long-arrow-left”] Latin[/x_button][/cs_block_grid_item][cs_block_grid_item title=”Index”][x_button shape=”square” size=”mini” float=”left” icon_only=”true” href=”/tools-resources/language-resources/” info=”none” info_place=”top” info_trigger=”hover”] [x_icon type=”list-ul”] Index[/x_button][/cs_block_grid_item][cs_block_grid_item title=”Next”][x_button shape=”square” size=”mini” float=”left” icon_only=”true” href=”/tools-resources/language-resources/historical-lexicon-of-czech/” info=”none” info_place=”top” info_trigger=”hover”][x_icon type=”long-arrow-right”]Czech[/x_button][/cs_block_grid_item][/cs_block_grid][/cs_column][/cs_row][/cs_section][cs_section bg_color=”hsl(0, 0%, 100%)” parallax=”false” separator_top_type=”none” separator_top_height=”50px” separator_top_angle_point=”50″ separator_bottom_type=”none” separator_bottom_height=”50px” separator_bottom_angle_point=”50″ style=”margin: 0px;padding: 0 0px 10px;”][cs_row inner_container=”true” marginless_columns=”false” style=”margin: 0 auto 0px;padding: 0px;”][cs_column fade=”false” fade_animation=”in” fade_animation_offset=”45px” fade_duration=”750″ type=”2/3″ style=”padding: 0px;”][cs_text][tabby title=”Abstract”]
The search engine, made available by the Formal Linguistics Department of the University of Warsaw, facilitates searching digitalized texts in the DjVu format. The engine is a modification of the Poliqarp system (developed in the Institute of Computer Science of Polish Academy of Sciences) used to support the National Corpus of Polish, so it has the same query syntax. The modification has been implemented by Jakub Wilk, who also converted most of the texts to a suitable format. The idea to use Poliqarp for DjVu texts was developed by Janusz S. Bień. It was presented in a paper entitled “Facilitating access to digitalized dictionaries” and later in other publications including “Efficient search in hidden text of large DjVu documents”.[/cs_text][cs_text][tabby title=”Publications”]
- Bień, Janusz S. (2012) Delivering the IMPACT project Polish Ground-Truth texts with Poliqarp for DjVu. Technical Report. Katedra Lingwistyki Formalnej UW. (Unpublished)
Availability
[tabbyending][/cs_text][/cs_column][cs_column fade=”false” fade_animation=”in” fade_animation_offset=”45px” fade_duration=”750″ type=”1/3″ style=”padding: 10px;border-style: solid;border-width: 0;border-color: hsl(0, 0%, 100%);”][x_custom_headline level=”h2″ looks_like=”h4″ accent=”false”]Availability[/x_custom_headline][cs_text]The search engine can be found at:
[/cs_text][/cs_column][/cs_row][/cs_section][cs_section bg_color=”hsl(0, 0%, 100%)” parallax=”false” separator_top_type=”none” separator_top_height=”50px” separator_top_angle_point=”50″ separator_bottom_type=”none” separator_bottom_height=”50px” separator_bottom_angle_point=”50″ style=”margin: 0px;padding: 25px 0px;”][cs_row inner_container=”true” marginless_columns=”false” style=”margin: 0px auto;padding: 0px;”][cs_column fade=”false” fade_animation=”in” fade_animation_offset=”45px” fade_duration=”750″ type=”1/1″ style=”padding: 0px;”][x_prompt type=”left” title=”Language resources” message=”The Impact Centre of Competence provide historical and <strong>named entities lexica</strong> for the following languages. In addition, we offer access to the different <strong>corpora</strong>.” button_text=”View more” button_icon=”info-circle” circle=”false” href=”/tools-resources/tools-for-text-digitisation/ocr-post-correction-and-enrichment/” href_title=”” target=””][/cs_column][/cs_row][/cs_section][/cs_content]