posted by organizer: Raly || 856 views || tracked by 1 users: [display]

GermEval Task 1 2019 : Shared Task on hierarchical classification of German Blurbs

FacebookTwitterLinkedInGoogle

Link: https://competitions.codalab.org/competitions/21226
 
When Feb 1, 2019 - Aug 31, 2019
Where Erlangen, Germany
Submission Deadline Jul 15, 2019
Final Version Due Jul 31, 2019
Categories    hierarchical classification   multi-label classification   book blurbs   german texts
 

Call For Papers

GermEval 2019 Task 1 - Shared Task on hierarchical classification of German blurbs (short texts)

*Call for Participation*
We invite interested parties from academia and industry to participate in this shared task. Further information can be found here: https://competitions.codalab.org/competitions/21226.

Hierarchical multi-label classification (HMC) of blurbs is the task of classifying multiple labels for short descriptive texts of books, where each label is part of an underlying hierarchy of categories. The increasing amount of available digital documents and the need for more and finer-grained categories calls for new, more robust and sophisticated text classification methods. Large datasets often incorporate a categorical hierarchy, that can be used to organize information of documents on different levels of specificity. Traditional multi-class text classification approaches are thoroughly researched, however, since traditional approaches fail to generalize adequately with the increase of available data and the necessity of more specific hierarchies, the need for more robust and sophisticated classification methods increases.

With this task we aim to foster research within the HMC context. This task is focusing on classifying German books into their respective hierarchically structured categories using short advertisement texts (blurbs). The data contains additional metadata such as author, page number, release date, etc.


*Tasks*
This shared task consists of two subtasks, described below. Participants are free to participate in either one of them or both.

- *Subtask A*: The task is to classify German books into *one or multiple most general categories*. It can be thus be considered a non-hierarchical multi-label classification task. Eight classes can be assigned in total: 'Literatur & Unterhaltung', 'Ratgeber', 'Kinderbuch & Jugendbuch', 'Sachbuch', 'Ganzheitliches Bewusstsein', 'Glaube & Ethik', 'Künste, Architektur & Garten'.

- *Subtask B*: The second task targets hierarchical multi-label classification, where the full hierarchy of labels should be assigned to a book. In addition to the most general category (Subtask A), additional categories of different specificity can be assigned to a book. In total, 343 different classes can be assigned in a hierarchical structure of maximally 4 levels.


*Data*
The entire dataset consists of 20,784 examples in total. Sample data is provided in order to enable familiarization with the structure of the data. 14,548 training samples have been released and can be downloaded after registering for the shared tasks. A validation set (2,079 samples) has been published where gold labels have been held back. Submissions for the validation set via the codalab page are accepted and published on a leaderboard until June 1st. From June 1st, we will start the final evaluation phase of the task by providing the gold labels of the validation set, which can be used as additional training data. Additionally, the test set samples will be provided, for which we accept submissions until July, 15th. More information can be found on the task's webpage: https://competitions.codalab.org/competitions/21226


*Important Dates*
- January 2019: Release of trial data
- February 01, 2019: Release of training data (train + validation)
- June 01, 2019: Release of gold labels for validation set + test data
- July 15, 2019: Final deadline for submissions of test results
- July 31, 2019: Submission of description papers
- August 20, 2019: Notification of acceptance
- September 15, 2019: Camera-ready deadline for system description papers
- October 08, 2019: Workshop in Erlangen, Germany

The shared task will be accompanied by a pre-conference workshop of the Conference on Natural Language Processing ("Konferenz zur Verarbeitung natürlicher Sprache", KONVENS) hosted on October 8, 2019 at FAU Erlangen-Nuremberg (http://2019.konvens.org/).


*Workshop Proceedings*
Description papers will appear in online workshop proceedings. Participants who submit a description paper will be asked to register at the workshop and present their system as a poster or in an oral presentation (depending on the number of submissions).


*Organizers*
The task is organized by Rami Aly, Steffen Remus and Chris Biemann, Language Technology, Department of Informatics, Universität Hamburg, https://lt.informatik.uni-hamburg.de


*GermEval*
GermEval is a series of shared task evaluation campaigns that focus on Natural Language Processing for the German language. GermEval has been conducted four times since 2014 in co-location with KONVENS/GSCL conferences. For an overview of the currently conducted tasks, please see http://2019.konvens.org/germeval. We highly encourage readers to also take note of task 2 (Identification of offensive language, https://projects.fzai.h-da.de/iggsa/) and task 3 (Lemmatization of German Web and Social Media Texts, https://fau-klue.github.io/empirist-lemmatization/).

Related Resources

GEM shared task 2024   GEM 2024 multilingual data-to-text and summarization shared task
MLSP 2024   Multilingual Lexical Simplification Pipeline (MLSP) Shared Task @ 19th Workshop on Innovative Use of NLP for Building Educational Applications
KONVENS-ST/T/WS 2024   Call for Shared Task, Workshop and Tutorial Proposals @ KONVENS 2024
IberLEF 2024   Call for Task Proposals - IberLEF 2024
WAMTA 2024   Workshop on Asynchronous Many-Task Systems and Applications
WMT-Testsuites 2024   'Help us break LLMs' - Test suite sub-task of the Ninth Conference on Machine Translation (WMT24)
PDCTA 2024   13th International Conference on Parallel, Distributed Computing Technologies and Applications
AMTE 2024   Asynchronous Many-Task systems for Exascale 2024
NTCIR 2025   NTCIR-18 Task Proposal
SMM4H 2024   The 9th Social Media Mining for Health Research and Applications Workshop and Shared Tasks — Large Language Models (LLMs) and Generalizability for Social Media NLP