WILDRE 2022 : Workshop on Indian Language Data: Resources and Evaluation @ LREC 2022

posted by organizer: shashwatup9k || 1469 views || tracked by 1 users: [display]

WILDRE 2022 : Workshop on Indian Language Data: Resources and Evaluation @ LREC 2022

Link: http://sanskrit.jnu.ac.in/conf/wildre6

When	Jun 20, 2022 - Jun 20, 2022
Where	Palais du Pharo, Marseille (France)
Submission Deadline	Mar 31, 2022
Notification Due	May 3, 2022
Final Version Due	May 23, 2022

Call For Papers

6th Workshop on Indian Language Data: Resources and Evaluation (WILDRE)
Date: Monday, 20th June 2022 (afternoon session)

Venue: Palais du Pharo, Marseille, France (Organized under LREC 2022 (20-25 June 2022))

Website: http://sanskrit.jnu.ac.in/conf/wildre6

WILDRE-6, the 6th Workshop on Indian Language Data: Resources and Evaluation is proposed to be organized in Marseille (France) on 20th June 2022 under the LREC platform. India has a huge linguistic diversity and has seen concerted efforts from the Indian government and industry for developing language resources. European Language Resource Association (ELRA) and its associate organizations have been very active and successful in addressing the challenges and opportunities related to language resource creation and evaluation. It is therefore a big opportunity for resource creators of Indian languages to showcase their work on this platform and also to interact and learn from those involved in similar initiatives all over the world. The broader objectives of the WILDRE will be

To map the status of Indian Language Resources

To investigate challenges related to creating and sharing various levels of language resources

To promote a dialogue between language resource developers and users

To provide an opportunity for researchers from India to collaborate with researchers from other parts of the world

Dates for Short/Long papers and Posters and Demos

March 31, 2022: Paper submissions due

May 03, 2022: Paper notification acceptance

May 23, 2022: Camera-ready papers due

June 20, 2022: Workshop

SUBMISSIONS

Papers must describe original, completed/ in progress and unpublished work. Each submission will be reviewed by three program committee members.

Accepted papers will be given up to 10 pages (for full papers) 5 pages (for short papers and posters) in the workshop proceedings, and will be presented as oral paper or poster.

Papers should be formatted according to the LREC style-sheet, which is provided on the LREC 2022 website (http://lrec2022.lrec-conf.org/en/submission/authors-kit/). Please submit papers in PDF format to the LREC website.

We are seeking submissions under the following category

Full papers (10 pages)

Short papers (work in progress : 5 pages)

Posters (innovative ideas/proposals, research proposal of students)

Demo (of working online/standalone systems)

WILDRE-6 will have a special focus on Demos of Indian Language Technology. In the past few years, as more resources have been developed and made available, there has been increased activity in developing usable technology using these. WILDRE-6 would like to encourage and widen the Demo track to allow the community to showcase their demos and have mutually beneficial interactions with each other as well as resource developers.

WILDRE-6 is seeking full, short papers, posters and demos on the following topics related to Indian Language Resources:

Digital Humanities, heritage computing

Corpora - text, speech, multimodal, methodologies, annotation and tools

Lexicons and Machine-readable dictionaries

Ontologies, Grammars

Language resources for NLP/ IR/Speech tasks, tools and Infrastructure for language resources

Standards or specifications for language resources application

Licensing and copyright issues

Data mining

Text summarization

Both submission and review processes will be handled electronically. The review process will be double-blind. The workshop website will provide the submission guidelines and the link for the electronic submission.

When submitting a paper from the START page (https://www.softconf.com/lrec2022/WILDRE-6/), authors will be asked to provide essential information about resources (in a broad sense, i.e. also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of your research. Moreover, ELRA encourages all LREC authors to share the described LRs (data, tools, services, etc.), to enable their reuse, replicability of experiments, including evaluation ones, etc.

For further information on this initiative, please refer to http://lrec2022.lrec-conf.org/en/

Shared Task

The Sixth Workshop on Indian Language Data: Resources and Evaluation (WILDRE-6) at LREC-2022 will include two shared tasks on (a) Speech Technologies for Under-resourced Indian Languages (SpeechTech-IL) and (b) Universal Dependency based Morpho-Syntactic Parsing in Indian Languages (UDParse-IL).

(a) Speech Technologies for Under-resourced Indian Languages (SpeechTech-IL)

Neural or deep learning techniques are currently being applied in state-of-the-art automated systems that report significant performance improvements, but typically require a large amount of high-quality data. However, in order to advance Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) systems for low resource languages, the zero-shot/unsupervised approach is one notable development in Neural learning that builds ASR/TTS systems for languages where the size of audio and/or transcribed speech data may be small or even non-existent. In this shared task, we will solicit participants to submit novel zero-shot (or similar methods) and/or linguistically-encoded features systems for under-resourced Indian languages. The goal will be to ascertain the effectiveness of the method implemented for language pairs as well as for unseen similar languages. The languages are Hindi, Odia, Marathi and Bhojpuri. In evaluation, participants will also get 2/3 surprise tests for closely-related languages. The system(s) will be evaluated using WER, precision, recall and F-score.

Shared Task Organizers
Atul Kr. Ojha, NUI Galway, Ireland and Panlingua Language Processing LLP
Kalika Bali, Microsoft Research India
Vivek Sheshadri, MSR, India
Esha Banerjee, Google USA
Sourabrta Mukherjee, Panlingua Language Processing LLP & Charles University, Prague
Swapnil Fadte, Goa University, Goa
Manu Chopra, Karya Inc.

(b) Universal Dependency based Morpho-Syntactic Parsing in Indian Languages (UDParse-IL)

The primary objective of the UDParse-IL task is to find notable techniques for developing universal dependency parsers, especially when a language is low-resourced. In this task, the participants will be provided with training, development and testing datasets annotated with dependency relations in 10 Indian Languages - Bhojpuri, Hindi, Marathi, Sanskrit, Tamil, Telugu, and Urdu - and we will solicit participants to submit systems based on novel zero/few-shot (or other cross-lingual and multilingual) similar methods for these low-resource Indian languages. All the languages included in this task, with the exception of Hindi and Urdu, don’t have more than 1,350 annotated sentences. The mentioned above languages data will be shared by UFAL, Charles University from the Universal Dependencies (UD) repositories. We will provide test data and an evaluation platform to evaluate the participant's developed parsers. The parsers will be evaluated using LAS, UAS, precision, recall and F-score. One of the primary goals of the task is to ascertain the effectiveness of the implemented methods for unseen but closely-related languages, in addition to the languages for which the training dataset is being provided. In order to do this, the test data will include some surprise languages - the names of these surprise/unseen test languages will be revealed at the test time itself and a test set for these languages will be provided.

Shared Task Organizers
Atul Kr. Ojha, NUI Galway, Ireland and Panlingua Language Processing LLP
Ritesh Kumar, Agra University
Akanksha Bansal, Panlingua Language Processing LLP
Aryaman Arora, Georgetown University
Girish Nath Jha, JNU, New Delhi, India
Sobha L., AU-KBC, India

Shard Task Dates

Jan 31, 2022: Registration

Feb 09, 2022: Train and Validation Data set Release

March 17, 2022: Test Set Release

March 24, 2022: System Submission Due

April 08, 2022: System Results

April 18, 2022: System Description Paper Due

May 03, 2022: Paper notification of acceptance

May 23, 2022: Camera-ready papers due

Contact

For questions related to shared tasks (a) and (b), please send an email to wildre-speechtechil@googlegroups.com and wildre-udparseil@googlegroups.com respectively.

Workshop Chairs

Girish Nath Jha, Jawaharlal Nehru University, India

Kalika Bali, Microsoft Research India Lab, Bangalore, India

Sobha L, AU-KBC, Anna University, Chennai, India

Workshop Organizing Committee

Girish Nath Jha, Jawaharlal Nehru University, India

Kalika Bali, Microsoft Research India Lab, Bangalore, India

Sobha L, AU-KBC, Anna University, Chennai, India

Atul Kr. Ojha, National University of Ireland, Galway, Ireland & Panlingua Language Processing LLP, India

Program Committee

Adil Amin Kak, Kashmir University

Anil Kumar Singh, IIT BHU, Benaras

Anupam Basu, Director, NIIT, Durgapur

Anoop Kunchukuttan, Microsoft AI and Research, India

Arul Mozhi, University of Hyderabad

Asif Iqbal, IIT Patna, Patna

Atul Kr. Ojha, National University of Ireland Galway, Ireland & Panlingua Language Processing LLP, India

Bharathi Raja Asoka Chakravarthi, National University of Ireland Galway, Ireland

Bogdan Babych, Heidelberg University, Germany

Chao-Hong Liu, Potamu Research Ltd., Ireland

Claudia Soria, CNR-ILC, Italy

Dafydd Gibbon, Universität Bielefeld, Germany

Daan van Esch, Google, USA

Dan Zeman, Charles University, Prague, Czech Republic

Delyth Prys, Bangor University, UK

Dipti Mishra Sharma, IIIT, Hyderabad

Diwakr Mishra, Amazon-Bangalore, India

Dorothee Beermann, Norwegian University of Science and Technology (NTNU)

Elizabeth Sherley, IIITM-Kerala, Trivandrum

Esha Banerjee, Google, USA

Eveline Wandl-Vogt, Austrian Academy of Sciences, Austria

Georg Rehm, DFKI, Germany

Girish Nath Jha, Jawaharlal Nehru University, New Delhi

Jan Odijk, Utrecht University, The Netherlands

John P. McCrae, National University of Ireland Galway, Ireland

Jolanta Bachan, Adam Mickiewicz University, Poland

Joseph Mariani, LIMSI-CNRS, France

Jyoti D. Pawar, Goa University

Kalika Bali, MSRI, Bangalore

Khalid Choukri, ELRA, France

Lars Hellan, NTNU, Norway

M J Warsi, Aligarh Muslim University, India

Malhar Kulkarni, IIT Mumbai

Manji Bhadra, Bankura University, West Bengal

Marko Tadic, Croatian Academy of Sciences and Arts, Croatia

Massimo Monaglia, University of Florence, Italy

Monojit Choudhary, MSRI Bangalore

Narayan Choudhary, CIIL, Mysore

Nicoletta Calzolari, ILC-CNR, Pisa, Italy

Niladri Shekhar Dash, ISI Kolkata

Panchanan Mohanty, GLA, Mathura

Pinky Nainwani, Cognizant Technology Solutions, Bangalore

Pushpak Bhattacharya, IIT Mumbai

Rajeev R R, ICFOSS, Trivandrumv

Ritesh Kumar, Agra University

Shantipriya Parida, Idiap Research Institute, Switzerland

Vijay Kumar, TDIL, MEITY, Govt of India

S.S. Agrawal, KIIT, Gurgaon, India

Sachin Kumar, EZDI, Ahmedabad

Santanu Chaudhury, Director, IIT Jodhpur

Shivaji Bandhopadhyay, Director, NIT, Silchar

Sobha L, AU-KBC Research Centre, Anna University

Stelios Piperidis, ILSP, Greece

Subhash Chandra, Delhi University

Swaran Lata, TDIL, MCIT, Govt of India

Virach Sornlertlamvanich, Thammasat University, Bangkok, Thailand

Vishal Goyal, Punjabi University, Patiala

Zygmunt Vetulani, Adam Mickiewicz University, Poland

Workshop contact:

Atul Kr. Ojha, National University of Ireland, Galway, Ireland & Panlingua Language Processing LLP, India, atulkumar.ojha@insight-centre.org

WILDRE 2022 : Workshop on Indian Language Data: Resources and Evaluation @ LREC 2022

Call For Papers

Related Resources