posted by organizer: nhamad || 470 views || tracked by 1 users: [display]

WojoodNER 2023 : Call for participation - Arabic NER Shared Task 2023

FacebookTwitterLinkedInGoogle

Link: https://dlnlp.ai/st/wojood/
 
When May 22, 2023 - Jul 30, 2023
Where WANLP 2023 Conference
Abstract Registration Due Jun 10, 2023
Submission Deadline Aug 29, 2023
Notification Due Oct 12, 2023
Final Version Due Oct 30, 2023
Categories    ner   arabicnlp   nlproc
 

Call For Papers

Dear colleagues,


We are happy to invite you to join the Arabic NER SharedTask 2023 which will be organized as part of the WANLP 2023. We will provide you with a large corpus and Google Colab notebooks to help you reproduce the baseline results.



دعوة للمشاركة في مسابقة استخراج الكيونات المسماه من النصوص العربية. سنزود المشاركين بمدونة وبرمجيات للحصول على نتائج مرجعية يمكنهم البناء عليها.

INTRODUCTION

Named Entity Recognition (NER) is integral to many NLP applications. It is the task of identifying named entity mentions in unstructured text and classifying them to predefined classes such as person, organization, location, or date. Due to the scarcity of Arabic resources, most of the research on Arabic NER focuses on flat entities and addresses a limited number of entity types (person, organization, and location). The goal of this shared task is to alleviate this bottleneck by providing Wojood, a large and rich Arabic NER corpus. Wojood consists of about 550K tokens (MSA and dialect, in multiple domains) that are manually annotated with 21 entity types.



REGISTRATION

Participants need to register via this form (https://forms.gle/UCCrVNZ2LaPviCZS6). Participating teams will be provided with common training development datasets. No external manually labelled datasets are allowed. Blind test data set will be used to evaluate the output of the participating teams. Each team is allowed a maximum of 3 submissions. All teams are required to report on the development and test sets (after results are announced) in their write-ups.

FAQ

For any questions related to this task, please check our Frequently Asked Questions

IMPORTANT DATES

- March 03, 2023: Registration available
- May 25, 2023: Data-sharing and evaluation on development set Avaliable
- June 10, 2023: Registration deadline
- July 20, 2023: Test set made available
- July 30, 2023: Evaluation on test set (TEST) deadline
- August 29, 2023: Shared task system paper submissions due
- October 12, 2023: Notification of acceptance
- October 30, 2023: Camera-ready version
- TBA: WANLP 2023 Conference.

* All deadlines are 11:59 PM UTC-12:00 (Anywhere On Earth).

CONTACT

For any questions related to this task, please contact the organizers directly using the following email address: NERSharedtask2023@gmail.com or join the google group: https://groups.google.com/g/ner_sharedtask2023.



SHARED TASK

As described, this shared task targets both flat and nested Arabic NER. The subtasks are:

Subtask 1: Flat NER
In this subtask, we provide the Wojood-Flat train (70%) and development (10%) datasets. The final evaluation will be on the test set (20%). The flat NER dataset is the same as the nested NER dataset in terms of train/test/dev split and each split contains the same content. The only difference in the flat NER is each token is assigned one tag, which is the first high-level tag assigned to each token in the nested NER dataset.

Subtask 2: Nestd NER
In this subtask, we provide the Wojood-Nested train (70%) and development (10%) datasets. The final evaluation will be on the test set (20%).



METRICS

The evaluation metrics will include precision, recall, F1-score. However, our official metric will be the micro F1-score.

The evaluation of shared tasks will be hosted through CODALAB. Teams will be provided with a CODALAB link for each shared task.

-CODALAB link for NER Shared Task Subtask 1 (Flat NER)
-CODALAB link for NER Shared Task Subtask 2 (Nestd NER)



BASELINES

Two baseline models trained on Wojood (flat and nested) are provided:

Nested NER baseline: is presented in this article, and code is available in GitHub. The model achieves a micro F1-score of 0.9059 (note that this baseline does not handle nested entities of the same type).

Flat NER baseline: same code repository for nested NER (GitHub) can also be used to train flat NER task. Our flat NER baseline achieved a micro F1-score of 0.8785.



GOOGLE COLAB NOTEBOOKS

To allow you to experiment with the baseline, we authored four Google Colab notebooks that demonstrate how to train and evaluate our baseline models.
[1] Train Flat NER: This notebook can be used to train our ArabicNER model on the flat NER task using the sample Wojood data found in our repository.
[2] Evaluate Flat NER: this notebook will use the trained model saved from the notebook above to perform evaluation on unseen dataset.
[3] Train Nested NER: This notebook can be used to train our ArabicNER model on the nested NER task using the sample Wojood data found in our repository.
[4] Evaluate Nested NER: this notebook will use the trained model saved from the notebook above to perform evaluation on unseen dataset.



ORGANIZERS

- Mustafa Jarrar, Birzeit University

- Muhammad Abdul-Mageed, University of British Columbia & MBZUAI

- Mohammed Khalilia, Birzeit University

- Bashar Talafha, University of British Columbia

- AbdelRahim Elmadany, University of British Columbia

- Nagham Hamad, Birzeit University

- Alaa Omer, Birzeit University

Related Resources

GEM shared task 2024   GEM 2024 multilingual data-to-text and summarization shared task
MLSP 2024   Multilingual Lexical Simplification Pipeline (MLSP) Shared Task @ 19th Workshop on Innovative Use of NLP for Building Educational Applications
GermEval2024 GerMS-Detect 2024   GermEval2024 Shared Task GerMS-Detect - Sexism Detection in German Online News Fora @Konvens 2024
AITIJ 2024   Advances in Interconnect Technologies: An International Journal
KONVENS-ST/T/WS 2024   Call for Shared Task, Workshop and Tutorial Proposals @ KONVENS 2024
IberLEF 2024   Call for Task Proposals - IberLEF 2024
SMM4H 2024   The 9th Social Media Mining for Health Research and Applications Workshop and Shared Tasks — Large Language Models (LLMs) and Generalizability for Social Media NLP
Call For Papers Special Issue 2024   Smart Cities, innovating in the Transformation of Urban Environments
PAKDD 2024   Call for Participation : Pacific-Asia Conference on Knowledge Discovery and Data Mining 2024
Call for Chapters - CRC Press 2024   Internet of Things: Challenges and Advances