When Apr 25, 2022 - Apr 26, 2022
Where The Web Conference 2022
Submission Deadline Feb 22, 2022
Notification Due Feb 25, 2022
Final Version Due Mar 10, 2022

Call For Papers

FinSIM-4 2022 The 4th Shared Task on Learning Semantic Similarities for the Financial Domain - extended version to ESG insights


We would like to invite you to submit to FinSIM-4, the 4th shared task on Learning Semantic Similarities for the Financial Domain, extended to ESG insights, held in conjunction with The Web Conference 2022 @ Online, 25th -26th April, 2022 as part of the FinWeb-2022 workshop.


The FinSim 2022 shared task aims to spark interest from communities in NLP, ML/AI, Knowledge Engineering and Financial document processing. Going beyond the mere representation of words is a key step to industrial applications that make use of Natural Language Processing (NLP). This is typically addressed using either 1) Unsupervised corpus-derived representations like word embeddings, which are typically opaque to human understanding but very useful in NLP applications or 2) Supervised approach to semantic representations learning, which typically requires an important volume of labeled data, but has high coverage for the target domain or 3) Manually labeled resources such as corpora, lexica, taxonomies and ontologies, which typically have low coverage and contain inconsistencies, but provide a deeper understanding of the target domain.

These approaches form different spectrum which a number of them have attempted to combine, particularly in tasks aiming at expanding the coverage of manual resources using automatic methods.

The Semeval community has organized several evaluation campaigns to stimulate the development of methods which extract semantic/lexical relations between concepts/words (Bordea et al. 2015, Bordea et al. 2016, Jurgens et al. 2016, Camacho-Collados et al. 2018).
A large number of datasets and challenges specifically look at how to automatically populate knowledge bases such as DBpedia or Wikidata (e.g. KBP challenges,
There are also a number of studies on the supervised and unsupervised approaches to the extraction of semantic relations between concepts and terms (Alfarone et al. 2015, Fauconnier et al. 2015, Shwartz et al. 2016, Sarkar et al. 2018, Martel et al. 2021).

Considering the ESG (Environmental, Social and Governance) related issues in the financial domain, from the end of 2022, companies providing investment products that make sustainability or environmental claims will be required to disclose how their portfolios align with the EU taxonomy ( for sustainable activities according to the European Commission. The objective is to elaborate a ESG taxonomy or ESG related concepts representations and make use of it to analyze how an economic activity complies with the taxonomy, by consequently, it allows to know how an investment product is aligned with it.

===Task Description===

The new edition FinSim-4 proposes two sub-tasks:

Sub-task 1. We have created an in-house sustainable finance taxonomy called Fortia ESG taxonomy. It is based on different financial data provider taxonomies as well as several sustainability and annual reports where we looked for ESG related criteria. Given a subset of Fortia ESG taxonomy (your trainset), participants will be asked to enrich this trainset to cover the rest of the terms of the original Fortia ESG taxonomy. For this purpose, participants will be given a set of annual reports and sustainability reports of financial companies from which they can develop a model allowing to induce semantically related terms to the concepts defined in the trainset. For example, given a set of terms related to the concept Waste management (e.g. Hazardous Waste, Waste Reduction Initiatives) you need to find the missing ones by the way that you predict a corresponding concept to unlabeled terms.

Sub-task 2. Participants will be asked to design a system which can automatically classify sentences into sustainable or unsustainable sentences making use of the enriched taxonomy if helpful. For this purpose, participants will be given a list of carefully selected labeled sentences from the sustainability reports and other documents. In this shared task, we consider a sentence as sustainable if a sentence semantically mentions the Environmental or Social or Governance related factors as defined in our ESG taxonomy.

Performance will be measured according to the accuracy with which label is assigned, and according to recall (based on the total number of predictions).

This year, we propose a subset of our in-house made ESG taxonomy and a dataset composed of financial and non-financial reportings. And we are interested in systems which make use of contextual word embeddings such as BERT (Devlin et al. 2018), as well as systems which make use of resources related to the ESG (Environmental, Social and Governance) and/or to sustainability including EU taxonomy.


A USD $500 prize will be rewarded to the best-performing team.

===Important Dates===

Submission paper:

December 22, 2021: First announcement of the shared task and beginning of registration

January 14, 2022 : Release of training set & scoring scripts.

February 16, 2022: Release of test set.

February 22, 2022: System's outputs submission deadline.

February 25, 2022: Release of results.

February 25, 2022: Shared task title and abstract due

March 01, 2022: Shared task paper submissions due

March 03, 2022: Registration deadline.

March 10, 2022: Camera-ready version of shared task paper due

April 25-26, 2022: FinWeb workshop @WWW Conference 2022


