posted by organizer: dherrmannova || 4518 views || tracked by 4 users: [display]

SDP 2020 : 1st Workshop on Scholarly Document Processing and Shared Tasks (SDP 2020) @ EMNLP 2020


When Nov 12, 2020 - Nov 12, 2020
Where Punta Cana, Dominican Republic
Submission Deadline Jul 15, 2020
Notification Due Aug 17, 2020
Final Version Due Aug 31, 2020
Categories    scholarly document processing   natural language processing   summarization   information retrieval

Call For Papers

You are invited to participate in the 1st Workshop on Scholarly Document Processing (SDP 2020) to be held in conjunction with the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020) on November 12 in Punta Cana, Dominican Republic.

The workshop will consist of a research track and a shared task track.

The shared task track includes the 6th edition of the CL-SciSumm shared task ( and two new summarization tasks -- CL-LaySumm and LongSumm -- geared towards easier access to scientific methods and results.

The tentative submission deadline is July 15, 2020.

Given the current COVID-19 situation, EMNLP 2020 *might be* an online / virtual conference ( In such an event, SDP 2020 will follow suit. If so, this gives an opportunity for people to submit who might otherwise be restricted due to lack of travel funds. We will keep updating our website with relevant information.

SDP is led by organizers of BIRNDL ( and WOSP ( workshop series.

Workshop Date and Venue: November 12, Punta Cana, Dominican Republic

** Introduction **

In addition to the long-standing challenge faced by scholars of keeping up with the growing literature in their own and related fields, they must now compete with malign pseudo-science and dis-information in informing public policy and behavior. This has stimulated workshops and research focused on enhancing search, retrieval, summarization, and analysis of scholarly documents. However, the general research community on scholarly document processing remains fragmented, and efforts towards natural language understanding of scholarly text that is central to vastly improve all the said downstream applications are not widespread.

To address these gaps, we, the organizers of BIRNDL and WOSP workshops, propose the first Workshop on Scholarly Document Processing. We seek to reach to the broader NLP and AI/ML community to pool the distributed efforts to improve scholarly document understanding and enable intelligent access to the published research. The goal of SDP is two-fold: to increase collaboration between communities interested in leveraging knowledge stored in scientific literature and data and to establish SDP as the single-focused primary venue for the field.

We seek to appeal to the mainstream NLP and ML community working on SDP tasks – which are NLP tasks – to publish at SDP as we seek to establish SDP as the integrated premier venue. We have established a steering committee to help us turn SDP into a conference in the forthcoming years.

** Topics of Interest **

We invite submissions from all communities interested in natural language processing, information retrieval, and data mining problems in scientific documents; and in processing scientific documents for easier access to various audiences. The topics of interest include, but are not limited to:

* Information extraction, text mining and parsing scholarly literature
* Reproducibility and peer review
* Lay Summarization (i.e., summaries created for non-experts) of individual and collections of scholarly documents
* Discourse modeling and argument mining
* Summarization and question-answering for scholarly documents
* Semantic and network-based indexing, search and navigation in structured text
* Graph analysis/mining including citation and co-authorship networks
* Analysing and mining of citation contexts for document understanding and retrieval
* New scholarly language resources and evaluation
* Connecting and interlinking publications, data, tweets, blogs or their parts
* Disambiguation, metadata extraction, enrichment, and data quality assurance for scholarly documents
* Bibliometrics, scientometrics, and altmetrics approaches and applications
* Other aspects of scientific workflows including open access/science, and research assessment
* Infrastructures for accessing scientific publications and/or research data

** The 6th Computational Linguistics Scientific Document Summarization Shared Task (CL-SciSumm 2020) ** (Organisers: Muthu Kumar Chandrasekaran)

CL-SciSumm is the first medium-scale shared task on scientific document summarization, with over 500 annotated documents. Last year's CL-SciSumm shared task introduced large scale training datasets, both annotated from ScisummNet and auto-annotated. For the task, Systems were provided with a Reference Paper (RP) and 10 or more Citing Papers (CPs) that all contain citations to the RP, which they used to summarise RP. This was evaluated against abstract and human written summaries on ROUGE.

The task is defined as follows:

* Given: A topic consisting of a Reference Paper (RP) and Citing Papers (CPs) that all contain citations to the RP. In each CP, the text spans (i.e., citances) have been identified that pertain to a particular citation to the RP.
* Task 1A: For each citance, identify the spans of text (cited text spans) in the RP that most accurately reflect the citance. These are of the granularity of a sentence fragment, a full sentence, or several consecutive sentences (no more than 5).
* Task 1B: For each cited text span, identify what facet of the paper it belongs to, from a predefined set of facets.
* Task 2 (optional bonus task): Finally, generate a structured summary of the RP from the cited text spans of the RP. The length of the summary should not exceed 250 words.

This year, CL-SciSumm '20 will have two new tracks: LaySumm and LongSumm.

** CL-LaySumm 2020: The 1st Computational Linguistics Lay Summary Challenge Shared Task ** (Organisers: Anita De Waard, Ed Hovy)

To ensure and increase the relevance of science for all of society and not just a small group of niche practitioners, researchers have been increasingly tasked by funders and publishers to outline the scope of their research for the general public by writing a summary for a lay audience, or lay summary. The LaySumm summarization task considers automating this responsibility, by enabling systems to automatically generate lay summaries. A lay summary explains, succinctly and without using technical jargon, what the overall scope, goal and potential impact of a scientific paper is.

The corpus for this task will comprise full-text papers with lay summaries, in a variety of domains, and from a number of journals. Elsevier will make available a collection of Lay Summaries from a multidisciplinary collection of journals, as well as the abstracts and full text of these journals.

The task is defined as follows:

* Given: A full-text paper, its Abstract, and a Lay Summary of a given paper
* Task: For each paper, generate a Lay Summary of the specified length


The Lay Summary Task will be scored by using several ROUGE metrics to compare the system output and the gold standard Lay Summary. As a follow-up to the intrinsic evaluation, we will crowdsource a number of automatically generated lay summaries to a panel of judges and a lay audience. Details of the crowdsourcing evaluation will be announced with the sharing of the final test corpus on July 1st.

All nominated entries will be invited to publish a paper in Open Access (Author-Payment Charges will be waived) in a selected Elsevier publication. Authors will be asked to provide an automatically generated lay summary of their paper, together with their contribution.

** LongSumm 2020: Shared Task on Generating Long Summaries for Scientific Documents ** (Organisers: Michal Shmueli-Scheuer, Guy Feigenblat)

Most of the work on scientific document summarization focuses on generating relatively short summaries (250 words or less). While such a length constraint can be sufficient for summarizing news articles, it is far from sufficient for summarizing scientific work. In fact, such a short summary resembles more to an abstract than to a summary that aims to cover all the salient information conveyed in a given text. Writing such summaries requires expertise and a deep understanding in a scientific domain, as can be found in some researchers’ blogs.

The LongSumm task opted to leverage blogs created by researchers in the NLP and Machine learning communities and use these summaries as reference summaries to compare the submissions against.

The corpus for this task includes a training set that consists of 1705 extractive summaries and around 700 abstractive summaries of NLP and Machine Learning scientific papers. These are drawn from papers based on video talks from associated conferences (Lev et al. 2019 TalkSumm) and from blogs created by NLP and ML researchers. In addition, we create a test set of abstractive summaries. Each submission is judged against one reference summary (gold summary) on ROUGE and should not exceed 600 words.

** Submission Information **

Authors are invited to submit full and short papers with unpublished, original work. Submissions will be subject to a double-blind peer-review process. Accepted papers will be presented by the authors at the workshop either as a talk or a poster. All accepted papers will be published in the workshop proceedings.

Submission Website: Submission is electronic, using the Softconf START conference management system:

The submissions should be in PDF format and anonymized for review. All submissions must be written in English and follow the EMNLP 2020 formatting requirements (

Long paper submissions: up to 8 pages of content, plus unlimited references.

Short paper submissions: up to 4 pages of content, plus unlimited references.

Final versions of accepted papers will be allowed 1 additional page of content so that reviewer comments can be taken into account.

Shared Task registration: Participants of all shared tasks need to register before March 31st, 2020:

** Important Dates **

Research track:

Submission deadline – July 15, 2020
Notification of Acceptance – August 17, 2020
Camera-ready submission due – August 31, 2020
Workshop – November 11 or 12, 2020

Shared task track:

Training set release – Feb 15, 2020
Deadline for registration – March 31, 2020
Test set release (Blind) – July 1, 2020
System runs due – August 1, 2020
Preliminary system reports due – August 16, 2020
Camera-ready submission due – August 31, 2020
Workshop – November 11 or 12, 2020

The dates are at this stage indicative only and can change.

** Keynote Speakers **

Kuansang Wang, Managing Director, Microsoft Research Outreach Academic Services
The second keynote speaker will be announced shortly

** Journal Extension **

In the past, the accepted authors were invited to submit an extended version of their work to a special issue of a selected journal. The organizers are currently in the process of identifying appropriate journals to host a similar special issue this year. Relevant updates including topics and requirements for this special issue will be shared on the workshop website in due time.

** Organizing Committee **

Muthu Kumar Chandrasekaran, Amazon, Seattle, USA
Anita de Waard, Elsevier, USA
Guy Feigenblat, IBM Research AI, Haifa Research Lab, Israel
Dayne Freitag, SRI International, San Diego, USA
Tirthankar Ghosal, Indian Institute of Technology Patna, India
Drahomira Herrmannova, Oak Ridge National Laboratory, USA
Eduard Hovy, Research Professor, LTI, Carnegie Melon University, USA
Petr Knoth, Open University, UK
David Konopnicki, IBM Research AI, Haifa Research Lab, Israel
Philipp Mayr, GESIS – Leibniz Institute for the Social Sciences, Germany
Robert M. Patton, Oak Ridge National Laboratory, USA
Michal Shmueli-Scheuer, IBM Research AI, Haifa Research Lab, Israel
Dominika Tkaczyk, Crossref, UK

** Steering Committee **

Edward Fox, Professor, Department of Computer Science and Director, Digital Library Research Laboratory, Virginia Tech
C. Lee Giles, David Reese Professor, College of Information Sciences and Technology, Pennsylvania State University
Min-Yen Kan, Associate Professor, School of Computing, National University of Singapore
Dragomir Radev, A. Bartlett Giamatti Professor of Computer Science, Yale University
Jie Tang, Professor and Associate Chair of the Department of Computer Science and Technology, Tsinghua University
Alex Wade, Group Technical Program Manager, Chan Zuckerberg Initiative
Kuansang Wang, Managing Director, Microsoft Research Outreach Academic Services
Bonnie Webber, Professor, School of Informatics, University of Edinburgh

** Programme Committee **

Please visit our website for the Complete list of PCs here:

More details available on the workshop website:

With kind regards,

SDP 2020 organizing committee

Related Resources

IEEE-CVIV 2020   2020 2nd International Conference on Advances in Computer Vision, Image and Virtualization (CVIV 2020)
EI--ICVISP 2020   2020 4th International Conference on Vision, Image and Signal Processing (ICVISP 2020)
WSDM 2021   14th ACM Conference on Web Search and Data Mining
FinCausal 2020   Call For Participation FinCausal 2020 Shared Task at FNP-FNS COLING 2020
IJNGN 2020   International Journal of Next - Generation Networks
DocEng 2020   The 20th ACM Symposium on Document Engineering
IJCSEA 2020   International Journal of Computer Science, Engineering and Applications
EI-CGMIP 2020   2020 2nd International Symposium on Computer Graphics, Multimedia and Image Processing (CGMIP 2020)
SIPPR 2021   2021 International Symposium on Signal, Image Processing and Pattern Recognition (SIPPR 2021)
ACM--NLPIR--Ei Compendex and Scopus 2020   ACM--2020 4th International Conference on Natural Language Processing and Information Retrieval (NLPIR 2020)--Scopus, Ei Compendex