posted by organizer: dferres || 296 views || tracked by 1 users: [display]

TSAR Shared Task 2022 : Shared Task on Lexical Simplification for English, Portuguese and Spanish


When Jul 19, 2022 - Sep 15, 2022
Where online or hybrid event @ EMNLP2022
Submission Deadline Sep 15, 2022
Notification Due Sep 30, 2022
Final Version Due Oct 15, 2022
Categories    lexical simplification   english   portuguese   spanish

Call For Papers

TSAR-2022 Shared Task on Lexical Simplification for
English, Portuguese and Spanish

In conjunction with the TSAR-2022 Workshop @EMNLP2022


Lexical Simplification is the process of reducing the lexical
complexity of a text by replacing difficult words with easier to read
(or understand) expressions while preserving the original information
and meaning.
Lexical Simplification (LS) aims at facilitating reading comprehension
to different target readerships such as foreign language learners,
native speakers with low literacy levels, second language learners or
people with different kinds of reading impairments. This new Lexical
Simplification Shared Task features three similar datasets in three
different languages: English, Brazilian Portuguese, and Spanish.

Definition of the task

Given a sentence containing a complex word, systems should return an
ordered list of “simpler” valid substitutes for the complex word in its
original context. The list of simpler words (up to a maximum of 10)
returned by the system should be ordered by the confidence the system
has in its prediction (best predictions first). The ordered list must
not contain ties.
An instance of the task for the English language is:
1. “That prompted the military to deploy its largest warship, the BRP
Gregorio del Pilar, which was recently acquired from the United

Complex word: deploy

For this instance a system may suggest the following ranked
send, move, position, redeploy, employ, situate…

Systems should only produce simplifications that are good contextual
fits (semantically and syntactically).

Participating teams can register (details below) for three different
tracks, one per language.

- English monolingual (EN)
- Portuguese (Brazilian) monolingual (PT-BR)
- Spanish monolingual (ES)

It is possible to participate in one, two or all three tracks.
Participating teams will be allowed to submit up to 3 runs per track.


The three datasets (trial data with gold annotations and test data
without gold annotations)and the evaluation script will be available
through a GitHub repository. There is no training dataset. However, a
sample of 10 or 12 instances with gold standard annotations will be
provided. Note that participating teams are allowed to use any other
lexical simplification datasets or resources for developing their
systems. Test data with gold annotations will also be released via the
same GitHub repository at the end of the evaluation period.

Evaluation Metrics

The evaluation metrics to be applied in the TSAR-2022 Shared Task are
the following:

MAP@K (Mean Average Precision @ K): K={1,3,5,10}. The MAP@K metric is
used commonly for evaluating Information Retrieval models and
Recommender Systems. For this Lexical Simplification task, instead of
using a ranked list of relevant and irrelevant documents to evaluate
our ranking output, we use a ranked list of predicted substitutes,
which can be matched (relevant) and not matched (irrelevant) terms
against the set of the gold-standard annotations for evaluation. The
traditional Precision metric, in the context of Lexical Simplification,
can be used to see how many of the predicted substitutes are relevant.
But precision fails to capture the order in which correctly predicted
substitutes are. Mean Average Precision is designed to work for binary
relevance: candidates that match or not in the list of gold
annotations. So MAP@K for Lexical Simplification evaluates the
following aspects: 1) are the predicted substitutes relevant?, and
2) are the predicted substitutes at the top positions?

Potential@K: K={1,3,5,10}. The percentage of instances for which at
least one of the substitutions predicted is present in the set of gold

Accuracy@K@top1: K={1,2,3}. The ratio of instances where at least one
of the K top predicted candidates matches the most frequently suggested
synonym/s in the gold list of annotated candidates.

Note 1: Potential@1/MAP@1/Precision@1 will give the same value.
Note 2: The exact computation of the metrics will be provided in the
official evaluation script.


Participating teams will be invited to submit system description papers
(four pages with an unlimited number of pages for references) which
will be peer-reviewed by at least 2 reviewers (at least one member of
each participating team will be required to help with the review
process) and papers will be published in the TSAR-2022 Workshop
proceedings. The submissions will be via SoftConf and details for
submission will be communicated to registered teams in due time.

Important dates

- Registration opens: July 19th, 2022
- Release of sample/trial instances with gold annotations:
July 20th, 2022
- Release of evaluation metrics and code: July 22th, 2022
- Registration deadline: September 7, 2022
- Test set release (without gold annotations): September 8, 2022
- Submissions of systems' output due: September 15, 2022
- Official results announced: September 30, 2022
- Test set release (wit gold annotations): September 30, 2022
- Submission of Shared Tasks papers deadline: October 15, 2022
- Shared Task Papers Reviews due: November 1, 2022
- Camera-ready deadline for Shared-task papers: November 10, 2022
- TSAR Workshop and Shared Task: December 8, 2022

Registering your team:

Please access this form to register for the TSAR-2022 Shared Task on
Lexical Simplification.

Website and Shared Task Guidelines

Please visit the TSAR-2022 Shared Task website to obtain further
information about the Guidelines, Datasets, and team registration.


- Horacio Saggion, Chair in Computer Science and Artificial
Intelligence and Head of the LaSTUS Lab in the TALN-DTIC,
Universitat Pompeu Fabra
- Sanja Štajner, Senior Research Scientist and R&D Application
Manager at Symanto Research
- Matthew Shardlow, Senior Lecturer at Manchester Metropolitan
- Marcos Zampieri, Assistant Professor at the Rochester Institute of
- Daniel Ferrés, Post-Doctoral Research Assistant at LaSTUS Lab. at
TALN-DTIC, Universitat Pompeu Fabra
- Kai North, PhD student at the Rochester Institute of Technology
- Kim Cheng Sheang, PhD student at LaSTUS Lab. at TALN-DTIC,
Universitat Pompeu Fabra

Related Resources

#SMM4H 2022   7th Social Media Mining for Health Applications - Workshop & Shared Task at COLING 2022
FinCausal 2022   [Second Call for Participants] FinCausal-2022 Shared Task: “Financial Document Causality Detection”
MML-Shared Task 2022   Multilingual Multimodal Learning 2022 Shared Task
CRAC-corefud 2022   CRAC 2022 Shared Task on Multilingual Coreference Resolution
FinCausal 2022   FinCausal-2021 Shared Task: “Financial Document Causality Detection”
BUCC 2022   15th Workshop on Building and Using Comparable Corpora with Shared Task on Multilingual Terminology Extraction from Comparable Corpora
JOKER 2022   Shared Task JOKER@CLEF: Automatic Wordplay and Humour Translation Task
TSAR 2022   Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)
WAMTA 2023   Workshop on Asynchronous Many-Task Systems and Applications
AMTE 2022   Asynchronous Many-Task systems for Exascale 2022