LREC 2022 Program - Day 2 Oral & Poster Sessions


Document sans nom
         Day 2
Wednesday, 22 June, 2022
09:30 - 10:50    Session O13: Statistical Methods and Machine Learning (1) - Auditorium
Chair: Sawaf, Hassan
Co-Chair: Snæbjarnarson, Vésteinn
09:30 - 09:50    The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild
Taja Kuzman, Peter Rupnik, Nikola Ljubešić
Jožef Stefan Institute
09:50 - 10:10    The Spoken Language Understanding MEDIA Benchmark Dataset in the Era of Deep Learning: data updates, training and evaluation tools
Gaëlle Laperrière1, Valentin Pelloin2, Antoine Caubrière3, salima mdhaffar4, Nathalie Camelin2, Sahar Ghannay5, Bassam Jabaian6, Yannick Estève7
1Avignon University LIA, 2LIUM - University of Le Mans, 3LIA, Avignon University, 4LIA - University of Avignon, 5CNRS, LISN, 6CERI-LIA, University of Avignon, 7LIA - Avignon University
10:10 - 10:30    BasqueGLUE: A Natural Language Understanding Benchmark for Basque
Gorka Urbizu1, Iñaki San Vicente1, Xabier Saralegi1, Rodrigo Agerri2, Aitor Soroa3
1Elhuyar Foundation, 2HiTZ Center - Ixa, University of the Basque Country UPV/EHU, 3Ixa group, HiTZ center University of the Basque Country (UPV/EHU)
10:30 - 10:50    Resources and Experiments on Sentiment Classification for Georgian
Nicolas Stefanovitch1, Jakub Piskorski2, Sopho Kharazi3
1Joint Research Centre, 2Polish Academy of Sciences, 3Piksel SRL
09:30 - 10:50    Session O14: Natural Language Generation and Summarisation - La Major
Chair: Skadina, Inguna
Co-Chair: Han, Kelvin
09:30 - 09:50    CoFiF Plus: A French Financial Narrative Summarisation Corpus
Nadhem ZMANDAR1, Tobias Daudert2, Sina Ahmadi3, Mahmoud El-Haj1, Paul Rayson1
1Lancaster University, 2Insight Centre for Data Analytics, National University of Ireland Galway, 3National University of Ireland Galway
09:50 - 10:10    Generating Extended and Multilingual Summaries with Pre-trained Transformers
Rémi Calizzano1, Malte Ostendorff2, Qian Ruan1, Georg Rehm1
1DFKI, 2German Research Center for Artificial Intelligence
10:10 - 10:30    MUSS: Multilingual Unsupervised Sentence Simplification by Mining Paraphrases
Louis Martin1, Angela Fan2, Éric de la Clergerie3, Antoine Bordes4, Benoît Sagot3
1Facebook AI Research / Inria, 2Facebook AI Research, 3INRIA, 4Facebook
10:30 - 10:50    Towards Understanding Gender-Seniority Compound Bias in Natural Language Generation
Samhita Honnavalli1, Aesha Parekh2, Lily Ou3, Sophie Groenwold4, Sharon Levy5, Vicente Ordonez6, William Yang Wang7
1University of California at Santa Barbara, 2University of California Santa Barbara, 3University of California-Santa Barbara, 4University of California, Santa Barbara, 5UC Santa Barbara, 6University of Virginia, 7Unversity of California, Santa Barbara
09:30 - 10:50    Session O15: Semantics - Salle 120
Chair: Pedersen, Bolette
Co-Chair: Alam, Mehwish
09:30 - 09:50    Combining ELECTRA and Adaptive Graph Encoding for Frame Identification
Fabio Tamburini
FICLIT - University of Bologna
09:50 - 10:10    Polysemy in Spoken Conversations and Written Texts
Aina Garí Soler1, Matthieu Labeau2, Chloé Clavel3
1LTCI, Télécom-Paris, Institut Polytechnique de Paris, 2Telecom Paris, 3LTCI, Telecom-Paris, Institut Polytechnique de Paris
10:10 - 10:30    Cross-Level Semantic Similarity for Serbian Newswire Texts
Vuk Batanović1 and Maja Miličević Petrović2
1Innovation Center of the School of Electrical Engineering, University of Belgrade, 2University of Bologna
10:30 - 10:50    Universal Proposition Bank 2.0
Ishan Jindal1, Alexandre Rademaker2, Michał Ulewicz3, Ha Linh4, Huyen Nguyen4, Khoi-Nguyen Tran5, Huaiyu Zhu6, Yunyao Li6
1IBM Research, 2IBM Research and EMAp/FGV, 3Kyndryl, 4VNU University of Science, Hanoi, 5IBM, 6IBM Research - Almaden
09:30 - 10:50    Session O16: Language Resources and Evaluation for Psycho-linguistics and Cognitive Linguistics - Salle 92
Chair: Paggio, Patrizia
Co-Chair: Tayyar Madabushi, Harish
09:30 - 09:50    The Copenhagen Corpus of Eye Tracking Recordings from Natural Reading of Danish Texts
Nora Hollenstein1, Maria Barrett2, Marina Björnsdóttir1
1University of Copenhagen, 2IT University of Copenhagen
09:50 - 10:10    The Brooklyn Multi-Interaction Corpus for Analyzing Variation in Entrainment Behavior
Andreas Weise1, Matthew McNeill2, Rivka Levitan3
1Graduate Center CUNY, 2CUNY Graduate Center, 3Brooklyn College CUNY
10:10 - 10:30    Pro-TEXT: an Annotated Corpus of Keystroke Logs
Aleksandra Miletic1, Christophe Benzitoun2, Georgeta Cislaru3, Santiago Herrera-Yanez4
1Clesthia & Paris 3 - Sorbonne Nouvelle University, 2ATILF (CNRS & Lorraine University), 3MoDyCo (CNRS & Paris Nanterre University), 4Paris 3 - Sorbonne Nouvelle University
10:30 - 10:50    Work Hard, Play Hard: Collecting Acceptability Annotations through a 3D Game
Federico Bonetti1, Elisa Leonardelli2, Daniela Trotta3, Raffaele Guarasci4, Sara Tonelli5
1University of Trento, 2Foundation Bruno Kessler, 3University of Salerno, 4ICAR-CNR, 5FBK
09:30 - 10:50    Session: P14 - Corpora and Annotation (2) - Poster Area 2
Chair: Ogrodniczuk, Maciej
   DiHuTra: a Parallel Corpus to Analyse Differences between Human Translations
Ekaterina Lapshinova-Koltunski1, Maja Popović2, Maarit Koponen3
1Universität des Saarlandes, 2ADAPT, Dublin City University, 3University of Eastern Finland
   Data Expansion Using WordNet-based Semantic Expansion and Word Disambiguation for Cyberbullying Detection
Md Saroar Jahan1, Djamila Romaissa Beddiar1, Mourad Oussalah1, Muhidin Mohamed2
1University of Oulu, 2University of Aston, Computer Science
   ALIGNMEET: A Comprehensive Tool for Meeting Annotation, Alignment, and Evaluation
Peter Polák1, Muskaan Singh2, Anna Nedoluzhko3, Ondřej Bojar1
1Charles University, MFF UFAL, 2UFAL,Charles University, 3Charles University in Prague
   KSoF: The Kassel State of Fluency Dataset – A Therapy Centered Dataset of Stuttering
Sebastian Bayerl1, Alexander Wolff von Gudenberg2, Florian Hönig3, Elmar Noeth4, Korbinian Riedhammer5
1TH-Nürnberg, 2Institut der Kasseler Stottertherapie, 3Pattern Recognition Lab, Friedrich-Alexander University of Erlangen-Nuremberg, Germany, 4Friedrich-Alexander-University Erlangen-Nuremberg, 5Technische Hochschule Nürnberg Georg Simon Ohm
   EZCAT: an Easy Conversation Annotation Tool
Gaël Guibon1, Luce Lefeuvre2, Matthieu Labeau3, Chloé Clavel4
1Télécom Paris, Direction Innovation & Recherche SNCF, 2Direction Innovation & Recherche SNCF, 3Telecom Paris, 4LTCI, Telecom-Paris, Institut Polytechnique de Paris
   Spoken Language Treebanks in Universal Dependencies: an Overview
Kaja Dobrovoljc
University of Ljubljana
   LeConTra: A Learner Corpus of English-to-Dutch News Translation
Bram Vanroy and Lieve Macken
Ghent University
   Annotating Attribution in Czech News Server Articles
Barbora Hladka1, Jiří Mírovský2, Matyáš Kopp3, Václav Moravec1
1Charles University, 2Charles University in Prague, 3Charles University and in the Czech Republic
   Xposition: An Online Multilingual Database of Adpositional Semantics
Luke Gessler, Nathan Schneider, Joseph Ledford, Austin Blodgett
Georgetown University
   A Study in Contradiction: Data and Annotation for AIDA Focusing on Informational Conflict in Russia-Ukraine Relations
Jennifer Tracey1, Ann Bies2, Jeremy Getman2, Kira Griffitt1, Stephanie Strassel2
1Linguistic Data Consortium, 2Linguistic Data Consortium, University of Pennsylvania
   Annotating Verbal Multiword Expressions in Arabic: Assessing the Validity of a Multilingual Annotation Procedure
Najet Hadj Mohamed1, Cherifa Ben Khelil2, Agata Savary3, Iskandar keskes4, Jean-Yves Antoine5, Lamia Hadrich-Belguith6
1University of Tours, LIFAT, ICVL; University of Sfax, MIRACL, 2University of Tours, LIFAT, ICVL, 3Paris-Saclay University, 4Associate professor, Gafsa university, 5Tours U., LIFAT Lab, 6ANLP Research Group, MIRACL Lab, FSEGS, Sfax University
   Annotation of Communicative Functions of Short Feedback Tokens in Switchboard
Carol Figueroa1, Adaeze Adigwe2, Magalie Ochs3, Gabriel Skantze4
1Furhat Robotics AB, 2ReadSpeaker, 3Aix Marseille, 4KTH Royal Institute of Technology
   A Dataset of Offensive Language in Kosovo Social Media
Adem Ajvazi and Christian Hardmeier
IT University of Copenhagen
   The Arabic Parallel Gender Corpus 2.0: Extensions and Analyses
Bashar Alhafni1, Nizar Habash2, Houda Bouamor3
1New York University, 2New York University Abu Dhabi, 3Carnegie Mellon University in Qatar
   The Engage Corpus: A Social Media Dataset for Text-Based Recommender Systems
Daniel Cheng, Kyle Yan, Phillip Keung, Noah A. Smith
University of Washington
   Annotating Arguments in a Corpus of Opinion Articles
Gil Rocha1, Luís Trigo1, Henrique Lopes Cardoso2, Rui Sousa-Silva3, Paula Carvalho4, Bruno Martins5, Miguel Won4
1LIACC, Faculty of Engineering, University of Porto, 2University of Porto, 3University of Porto - Faculty of Arts, 4INESC-ID, 5IST and INESC-ID
   German Parliamentary Corpus (GerParCor)
Giuseppe Abrami1, Mevlüt Bagci1, Leon Hammerla1, Alexander Mehler2
1Goethe University Frankfurt, 2Goethe-University Frankfurt am Main
   NerKor+Cars-OntoNotes++
Attila Novák1 and Borbála Novák2
1MTA-PPKE Hungarian Language Technology Research Group, Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, 2MTA-PPKE Hungarian Language Technology Research Group, Faculty of Information Technology and Bionics, Pázmány Péter Catholic University
09:30 - 10:50    Session: P15 - Speech Resources and Processing (2) - Poster Area 2
Chair: Kitaoka, Norihide
   A Comparative Cross Language View On Acted Databases Portraying Basic Emotions Utilising Machine Learning
Felix Burkhardt1, Anabell Hacker2, Uwe Reichel3, Hagen Wierstorf3, Florian Eyben3, Björn Schuller4
1audEERING, 2Technische Universität Berlin, 3audEERING GmbH, 4University of Augsburg / Imperial College London
   Nkululeko: A Tool For Rapid Speaker Characteristics Detection
Felix Burkhardt1, Johannes Wagner1, Hagen Wierstorf2, Florian Eyben2, Björn Schuller3
1audEERING, 2audEERING GmbH, 3University of Augsburg / Imperial College London
   Speech Aerodynamics Database, Tools and Visualisation
Shi YU1, Clara Ponchard1, Roland Trouville2, Sergio Hassid3, Didier Demolin2
1Laboratoire de Phonétique et Phonologie, Université Sorbonne Nouvelle, Paris 3., 2Laboratoire de Phonétique et Phonologie, Université de Sorbonne Nouvelle, Paris 3, 3Hôpital Erasme, Université Libre de Bruxelles
   PATATRA and PATAFreq: two French databases for the documentation of within-speaker variability in speech
Cécile Fougeron1, Nicolas Audibert1, cedric Gendrot2, Estelle Chardenon3, Louise Wohmann1
1Laboratoire de Phonétique et Phonologie, UMR7018 CNRS/Sorbonne-Nouvelle, Paris, 2Laboratory of Phonetics and Phonology, 3Laboratoire Parole et Langage, UMR 7309, CNRS/AMU Univ.)
   The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognition
Jonathan Mukiibi1, Andrew Katumba1, Joyce Nakatumba-Nabende1, Ali Hussein2, Joshua Meyer3
1Makerere University, 2Ronin Institute, 3Coqui
   Far-Field Speaker Recognition Benchmark Derived From The DiPCo Corpus
Mickael Rouvier1 and Mohammad Mohammadamini2
1LIA - Avignon University, 2Phd Student
   Evaluating Sampling-based Filler Insertion with Spontaneous TTS
Siyang Wang1, joakim gustafson2, Éva Székely1
1Division of Speech, Music and Hearing, KTH Royal Institute of Technology, 2KTH
   BEA-Base: A Benchmark for ASR of Spontaneous Hungarian
Peter Mihajlik1, Andras Balog2, Tekla Graczi2, Anna Kohari2, Balázs Tarján3, Katalin Mady2
1Budapest University of Technology and Economics, 2Hungarian Research Centre for Linguistics, 3Budapest University of Technology and Economics (BME)
   SNuC: The Sheffield Numbers Spoken Language Corpus
Emma Barker, Jon Barker, Robert Gaizauskas, Ning Ma, Monica Paramita
University of Sheffield
   The ManDi Corpus: A Spoken Corpus of Mandarin Regional Dialects
Liang Zhao and Eleanor Chodroff
University of York
   The Speed-Vel Project: a Corpus of Acoustic and Aerodynamic Data to Measure Droplets Emission During Speech Interaction
Francesca Carbone1, Gilles Bouchet2, Alain Ghio3, Thierry Legou4, Carine André1, muriel lalain5, Sabrina Kadri1, Caterina Petrone1, Federica Procino6, Antoine Giovanni1
1Aix Marseille Univ, CNRS, LPL, Aix-en-Provence, France, 2Aix Marseille Univ, CNRS, IUSTI, Marseille, France, 3Aix-Marseille Univ, CNRS, LPL, 4CNRS / Laboratoire Parole et Langage, 5Aix marseille université, CNRS, Laboratoire Parole et Langage, 6Università degli Studi di Napoli Federico II, Napoli, Italy
09:30 - 10:50    Session: P16 - Opinion Mining, Sentiment and Emotion (2) - Poster Area 2
Chair: Hendrickx, Iris
   Towards Speech-only Opinion-level Sentiment Analysis
Annalena Aicher1, Alisa Gazizullina2, Aleksei Gusev3, Yuri Matveev4, Wolfgang Minker1
1Ulm University, 2Student, 3STC-innovations/ITMO, 4ITMO University, Speech Technology Center
   At the Intersection of NLP and Sustainable Development: Exploring the Impact of Demographic-Aware Text Representations in Modeling Value on a Corpus of Interviews
Goya van Boven1, Stephanie Hirmer2, Costanza Conforti3
1Utrecht University, 2University of Oxford, 3University of Cambridge
   A Study on the Ambiguity in Human Annotation of German Oral History Interviews for Perceived Emotion Recognition and Sentiment Analysis
Michael Gref1, Nike Matthiesen2, Sreenivasa Hikkal Venugopala3, Shalaka Satheesh3, Aswinkumar Vijayananth3, Duc Bach Ha1, Sven Behnke4, Joachim Köhler5
1Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), 2Haus der Geschichte der Bundesrepublik Deutschland Foundation, 3Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS) & University of Applied Sciences Bonn-Rhein-Sieg, 4University of Bonn, 5Fraunhofer IAIS
   Detecting Optimism in Tweets using Knowledge Distillation and Linguistic Analysis of Optimism
Ștefan Cobeli1, Ioan-Bogdan Iordache2, Shweta Yadav1, Cornelia Caragea1, Liviu P. Dinu2, Dragoș Iliescu2
1University of Illinois at Chicago, 2University of Bucharest
   Dataset and Baseline for Automatic Student Feedback Analysis
Missaka Herath1, Kushan Chamindu2, Hashan Maduwantha1, Surangika Ranathunga1
1University of Moratuwa, 2University of Moratuwa Sri Lanka
09:30 - 10:50    Session: P17 - Less-Resourced Languages (1) - Poster Area 2
Chair: Todirascu, Amalia
   EENLP: Cross-lingual Eastern European NLP Index
Alexey Tikhonov1, Alex Malkhasov2, Andrey Manoshin3, George-Andrei Dima4, Réka Cserháti5, Md.Sadek Hossain Asif6, Matt Sárdi7
1Independent researcher, 2Financial University of Russia, 3MEPhI, 4Politehnica University of Bucharest, 5University of Szeged, 6Notre Dame College, Dhaka, 7Mozaik Education
   Slovene SuperGLUE Benchmark: Translation and Evaluation
Aleš Žagar and Marko Robnik-Šikonja
University of Ljubljana, Faculty of Computer and Information Science
   Speech Resources in the Tamasheq Language
Marcely Zanon Boito1, Fethi Bougares2, Florentin Barbier3, Souhir Gahbiche3, Loïc Barrault4, Mickael Rouvier5, Yannick Estève5
1Avignon Université, 2LIUM- Le Mans Université, 3Airbus, 4Le Mans Université, 5LIA - Avignon University
   Aesop's fable "The North Wind and the Sun" Used as a Rosetta Stone to Extract and Map Spoken Words in Under-resourced Languages
elena knyazeva1, Philippe Boula de Mareüil2, Frédéric Vernier3
1LISN CNRS, 2LISN, 3LISN-CNRS
   Multilingual Open Text Release 1: Public Domain News in 44 Languages
Chester Palen-Michel, June Kim, Constantine Lignos
Brandeis University
   TweetTaglish: A Dataset for Investigating Tagalog-English Code-Switching
Megan Herrera, Ankit Aich, Natalie Parde
University of Illinois at Chicago
   Jojajovai: A Parallel Guarani-Spanish Corpus for MT Benchmarking
Luis Chiruzzo1, Santiago Góngora1, Aldo Alvarez2, Gustavo Giménez-Lugo3, Marvin Agüero-Torales4, Yliana Rodríguez1
1Universidad de la República, 2Universidad Nacional de Itapúa, 3Universidade Tecnológica Federal do Paraná, 4Universidad de Granada
09:30 - 10:50    Industry Track -Mucem
Chairs: Bente Maegaard and Khalid Choukri
10:50 - 11:10    Coffee Break
11:10 - 12:30    Industry Track -Mucem
Chairs: Bente Maegaard and Khalid Choukri
11:10 - 12:30    Session O17: Evaluation and Validation Methodologies (1) - La Major
Chair: van Zaanen, Menno
Co-Chair: Eskevich, Maria
11:10 - 11:30    Assessing Multilinguality of Publicly Accessible Websites
Rinalds Vīksna1, Inguna Skadiņa2, Raivis Skadiņš3, Andrejs Vasiļjevs4, Roberts Rozis4
1University of Latvia, 2Tilde/ Institute of Mathematics and Computer Science, University of Latvia, 3Tilde; University of Latvia, 4Tilde
11:30 - 11:50    A Methodology for Building a Diachronic Dataset of Semantic Shifts and its Application to QC-FR-Diac-V1.0, a Free Reference for French
David Kletz1, Philippe Langlais2, François Lareau3, Patrick Drouin3
1UdeM, 2Université de Montréal, Department of Computer Science and Operational Research (DIRO), 3Université de Montréal, Department of Linguistics and Translation
11:50 - 12:10    CRASS: A Novel Data Set and Benchmark to Test Counterfactual Reasoning of Large Language Models
Jörg Frohberg1 and Frank Binder2
1apergo.ai, 2Institute for Applied Informatics at Leipzig University (InfAI)
12:10 - 12:30    Evaluating Gender Bias in Speech Translation
Marta R. Costa-jussà1, Christine Basta2, Gerard I. Gállego2
1Meta AI, 2Universitat Politècnica de Catalunya
11:10 - 12:30    Session O18: Applications involving LRs and Evaluation (2) - Auditorium
Chair: Hayashi, Yoshihiko
Co-Chair: Barrett, Maria
11:10 - 11:30    Design Choices in Crowdsourcing Discourse Relation Annotations: The Effect of Worker Selection and Training
Merel Scholman1, Valentina Pyatkin2, Frances Yung1, Ido Dagan2, Reut Tsarfaty2, Vera Demberg1
1Saarland University, 2Bar-Ilan University
11:30 - 11:50    TBD3: A Thresholding-Based Dynamic Depression Detection from Social Media for Low-Resource Users
Hrishikesh Kulkarni1, Sean MacAvaney2, Nazli Goharian1, Ophir Frieder1
1Georgetown University, 2University of Glasgow
11:50 - 12:10    SpecNFS: A Challenge Dataset Towards Extracting Formal Models from Natural Language Specifications
Sayontan Ghosh, Amanpreet Singh, Alex Merenstein, Wei Su, Scott Smolka, Erez Zadok, Niranjan Balasubramanian
Stony Brook University
12:30 - 12:50    Argument Similarity Assessment in German for Intelligent Tutoring: Crowdsourced Dataset and First Experiments
Xiaoyu Bai and Manfred Stede
University of Potsdam
11:10 - 12:30    Session O19: Information Extraction and Neural Networks - Salle 92
Chair: Hajič, Jan
Co-Chair: Ojha, Atul Kumar
11:10 - 11:30    Leveraging Pre-trained Language Models for Gender Debiasing
Nishtha Jain1, Declan Groves2, Lucia Specia3, Maja Popović4
1ADAPT Centre, Dublin City University, 2Microsoft, Dublin, 3Imperial College London, 4ADAPT, Dublin City University
11:30 - 11:50    Unsupervised Embeddings with Graph Auto-Encoders for Multi-domain and Multilingual Hate Speech Detection
Gretel Liz De la Peña Sarracén1 and Paolo Rosso2
1Universidad Politécnica de Valencia, 2Universitat Politècnica de València
11:50 - 12:10    FQuAD2.0: French Question Answering and Learning When You Don't Know
Quentin Heinrich, Gautier Viaud, Wacim Belblidia
Illuin Technology
12:10 - 12:30    Large-Scale Hate Speech Detection with Cross-Domain Transfer
Cagri Toraman1, Furkan Şahinuç2, Eyup Yilmaz1
1Aselsan Research Center, 2Bilkent University
11:10 - 12:30    Session O20: Dialogue (2) - Salle 120
Chair: Cucchiarini, Catia
Co-Chair: Hutin, Mathilde
11:10 - 11:30    GLoHBCD: A Naturalistic German Dataset for Language of Health Behaviour Change on Online Support Forums
Selina Meyer and David Elsweiler
University of Regensburg
11:30 - 11:50    Creating a Data Set of Abstractive Summaries of Turn-labeled Spoken Human-Computer Conversations
Iris Hendrickx
Centre for Language Studies, Radboud University Nijmegen
11:50 - 12:10    OpenEL: An Annotated Corpus for Entity Linking and Discourse in Open Domain Dialogue
Wen Cui1, Leanne Rolston2, Marilyn Walker1, Beth Ann Hockey3
1University of California Santa Cruz, 2LivePerson Inc., 3LivePerson
12:10 - 12:30    Collecting Visually-Grounded Dialogue with A Game Of Sorts
Bram Willemsen1, Dmytro Kalpakchi2, Gabriel Skantze3
1KTH, 2KTH Royal Institute of Technology, 3KTH Speech Music and Hearing
11:10 - 12:30    Session: P18 - Corpora and Annotation (3) - Poster Area 1
Chair: Montemagni, Simonetta
   CoRoSeOf - An Annotated Corpus of Romanian Sexist and Offensive Tweets
Diana Constantina Hoefels1, Çağrı Çöltekin2, Irina Mădroane3
1University of Tuebingen, 2University of Tübingen, 3West University of Timişoara
   ArMIS - The Arabic Misogyny and Sexism Corpus with Annotator Subjective Disagreements
Dina Almanea and Massimo Poesio
Queen Mary University of London
   Annotating Interruption in Dyadic Human Interaction
Liu YANG1, Catherine ACHARD2, Catherine PELACHAUD1
1ISIR, Sorbonne university, 2ISIR,Sorbonne university
   The Causal News Corpus: Annotating Causal Relations in Event Sentences from News
Fiona Anting Tan1, Ali Hürriyetoğlu2, Tommaso Caselli3, Nelleke Oostdijk4, Tadashi Nomoto5, Hansi Hettiarachchi6, Iqra Ameer7, Onur Uca8, Farhana Ferdousi Liza9, Tiancheng Hu10
1Institute of Data Science, National University of Singapore, 2KNAW, 3Rijksuniversiteit Groningen, 4Radboud University, 5National Institute of Japanese Literature, 6Birmingham City University, 7Centro de Investigación en Computación, Instituto Politécnico Nacional, 8Department of Sociology, Mersin University, 9Lecturer, University of East Anglia, 10ETH Zurich
   Samrómur: Crowd-sourcing large amounts of data
Staffan Hedström, David Erik Mollberg, Ragnheiður Þórhallsdóttir, Jón Guðnason
Reykjavik University
   An Annotated Corpus of Textual Explanations for Clinical Decision Support
Roland Roller1, Aljoscha Burchardt2, Nils Feldhus2, Laura Seiffe2, Klemens Budde3, Simon Ronicke3, Bilgin Osmanodja3
1DFKI LT Lab, 2DFKI, 3Charité
   LARD: Large-scale Artificial Disfluency Generation
Tatiana Passali1, Thanassis Mavropoulos2, Grigorios Tsoumakas1, Georgios Meditskos1, Stefanos Vrochidis3
1Aristotle University of Thessaloniki, 2Information Technologies Institute, Centre for Research and Technology Hellas, 3ITI-CERTH
   The CRECIL Corpus: a New Dataset for Extraction of Relations between Characters in Chinese Multi-party Dialogues
Yuru Jiang1, Yang Xu1, Yuhang Zhan1, Weikai He1, Yilin Wang2, Zixuan Xi2, Meiyun Wang1, Xinyu Li1, Yu Li1, Yanchao Yu3
1beijing information science and technology university, 2beijing information and science technology university, 3Edinburgh Napier University
   The Bahrain Corpus: A Multi-genre Corpus of Bahraini Arabic
Dana Abdulrahim1, Go Inoue2, Latifa Shamsan1, Salam Khalifa3, Nizar Habash2
1University of Bahrain, 2New York University Abu Dhabi, 3Stony Brook University
   A Universal Dependencies Treebank of Ancient Hebrew
Daniel Swanson and Francis Tyers
Indiana University
   Hate Speech Dynamics Against African descent, Roma and LGBTQI Communities in Portugal
Paula Carvalho1, Bernardo Cunha2, Raquel Santos2, Fernando Batista3, Ricardo Ribeiro3
1INESC-ID, 2Instituto Superior Técnico, Universidade de Lisboa, 3INESC-ID Lisboa, ISCTE
   Evolving Large Text Corpora: Four Versions of the Icelandic Gigaword Corpus
Starkaður Barkarson1, Steinþór Steingrímsson2, Hildur Hafsteinsdóttir1
1The Árni Magnússon Institute for Icelandic Studies, 2Reykjavik University
11:10 - 12:30    Session: P19 - Discourse and Pragmatics - Poster Area 1
Chair: Chiarcos, Christian
   A Pragmatics-Centered Evaluation Framework for Natural Language Understanding
Damien Sileo1, Philippe Muller2, Tim Van de Cruys3, Camille Pradel4
1KU Leuven, 2IRIT, University of Toulouse, 3University of Leuven, 4Synapse Developpement
   Conversational Analysis of Daily Dialog Data using Polite Emotional Dialogue Acts
Chandrakant Bothe and Stefan Wermter
University of Hamburg
   Inducing Discourse Marker Inventories from Lexical Knowledge Graphs
Christian Chiarcos
Goethe-Universität Frankfurt am Main
   Story Trees: Representing Documents using Topological Persistence
Pantea Haghighatkhah1, Antske Fokkens2, Pia Sommerauer3, Bettina Speckmann1, Kevin Verbeek1
1Eindhoven University of Technology, 2VU Amsterdam, 3Vrije Universiteit Amsterdam
   Extracting and Analysing Metaphors in Migration Media Discourse: towards a Metaphor Annotation Scheme
Ana Zwitter Vitez1, Mojca Brglez1, Marko Robnik Šikonja1, Tadej Škvorc1, Andreja Vezovnik1, Senja Pollak2
1University of Ljubljana, 2"Jožef Stefan" Institute
   DDisCo: A Discourse Coherence Dataset for Danish
Linea Flansmose Mikkelsen1, Oliver Kinch2, Anders Jess Pedersen2, Ophélie Lacroix2
1Aarhus University, 2Alexandra Institute
   LPAttack: A Feasible Annotation Scheme for Capturing Logic Pattern of Attacks in Arguments
Farjana Sultana Mim1, Naoya Inoue2, Shoichi Naito1, Keshav Singh1, Kentaro Inui3
1Tohoku University, 2Japan Advanced Institute of Science and Technology, 3Tohoku University / Riken
   BeSt: The Belief and Sentiment Corpus
Jennifer Tracey1, Owen Rambow2, Claire Cardie3, Adam Dalton4, Hoa Trang Dang5, Mona Diab6, Bonnie Dorr7, Louise Guthrie8, Magdalena Markowska2, Smaranda Muresan9, Vinodkumar Prabhakaran10, Samira Shaikh11, Tomek Strzalkowski12
1LDC, 2Stony Brook University, 3Cornell University, 4IHMC, 5National Institute of Standards and Technology, 6Facebook AI, 7University of Florida, 8University of Sheffield, 9Columbia University, 10Google, 11University of North Carolina at Charlotte, 12RPI
11:10 - 12:30    Session: P20 - Multimodality and Cross-modality (2) - Poster Area 1
Chair: Isard, Amy
   MOTIF: Contextualized Images for Complex Words to Improve Human Reading
Xintong Wang1, Florian Schneider1, Özge Alacam2, Prateek Chaudhury3, Chris Biemann4
1University of Hamburg, 2University of Bielefeld, 3Indian Institute of Technology Delhi, 4Universität Hamburg
   Challenges with Sign Language Datasets for Sign Language Recognition and Translation
Mirella De Sisto1, Vincent Vandeghinste2, Santiago Egea Gómez3, Mathieu De Coster4, Dimitar Shterionov1, Horacio Saggion3
1Tilburg University, 2Instituut voor de Nederlandse Taal, 3Universitat Pompeu Fabra, 4IDLab-AIRO -- Ghent University -- imec
   A Low-Cost Motion Capture Corpus in French Sign Language for Interpreting Iconicity and Spatial Referencing Mechanisms
Clémence Mertz1, Vincent BARREAUD2, Thibaut Le Naour3, Damien Lolive4, Sylvie Gibet2
1Université Rennes1, IRISA, 2IRISA, 3Motion-Up, 4Univ Rennes, CNRS, IRISA
   The CLAMS Platform at Work: Processing Audiovisual Data from the American Archive of Public Broadcasting
Marc Verhagen1, Kelley Lynch1, Kyeongmin Rim2, James Pustejovsky1
1Brandeis University, 2Department of Computer Science, Brandeis University
   BU-NEmo: an Affective Dataset of Gun Violence News
Carley Reardon, Sejin Paik, Ge Gao, Meet Parekh, Yanling Zhao, Lei Guo, Margrit Betke, Derry Tanti Wijaya
Boston University
   RoomReader: A Multimodal Corpus of Online Multiparty Conversational Interactions
Justine Reverdy1, Sam O'Connor Russell1, Louise Duquenne1, Diego Garaialde2, Benjamin Cowan2, Naomi Harte1
1Trinity College Dublin, 2University College Dublin
   Quevedo: Annotation and Processing of Graphical Languages
Antonio F. G. Sevilla, Alberto Díaz Esteban, José María Lahoz-Bengoechea
Universidad Complutense de Madrid
   Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel’s Weekly Video Podcasts
Debjoy Saha1, Shravan Nayak2, Timo Baumann3
1Indian Institute of Technology, Kharagpur, 2IIT (BHU) Varanasi, 3Ostbayerische Technische Hochschule Regensburg
   Crowdsourcing Kazakh-Russian Sign Language: FluentSigners-50
Medet Mukushev1, Aigerim Kydyrbekova1, Alfarabi Imashev1, Vadim Kimmelman2, Anara Sandygulova1
1Nazarbayev University, 2Bergen University
11:10 - 12:30    Session: P21 - Semantics - Poster Area 1
Chair: Gaizauskas, Robert
   Connecting a French Dictionary from the Beginning of the 20th Century to Wikidata
Pierre Nugues
Lund University
   Metaphor annotation for German
Markus Egg and Valia Kordoni
Humboldt-Universität zu Berlin
   NorDiaChange: Diachronic Semantic Change Dataset for Norwegian
Andrey Kutuzov1, Samia Touileb2, Petter Mæhlum1, Tita Enstad1, Alexandra Wittemann1
1University of Oslo, 2University of Bergen
   Exploring Transformers for Ranking Portuguese Semantic Relations
Hugo Gonçalo Oliveira
CISUC, DEI, University of Coimbra
   Building Static Embeddings from Contextual Ones: Is It Useful for Building Distributional Thesauri?
Olivier Ferret
CEA List
   Sentence Selection Strategies for Distilling Word Embeddings from BERT
Yixiao Wang1, Zied Bouraoui2, Luis Espinosa Anke1, Steven Schockaert1
1Cardiff University, 2CRIL-CNRS & University of Artois
   DiaWUG: A Dataset for Diatopic Lexical Semantic Variation in Spanish
Gioia Baldissin1, Dominik Schlechtweg2, Sabine Schulte im Walde2
1Institute for Natural Language Processing, Universität Stuttgart, 2University of Stuttgart
   My Case, For an Adposition: Lexical Polysemy of Adpositions and Case Markers in Finnish and Latin
Daniel Chen and Mans Hulden
University of Colorado
   WiC-TSV-de: German Word-in-Context Target-Sense-Verification Dataset and Cross-Lingual Transfer Analysis
Anna Breit, Artem Revenko, Narayani Blaschke
Semantic Web Company
   Re-train or Train from Scratch? Comparing Pre-training Strategies of BERT in the Medical Domain
Hicham El Boukkouri1, Olivier Ferret2, Thomas Lavergne3, Pierre Zweigenbaum4
1LIMSI, CNRS, Université Paris-Saclay, 2CEA List, 3LISN/CNRS & Université Paris Saclay, 4LISN, CNRS, Université Paris-Saclay
   Universal Semantic Annotator: the First Unified API for WSD, SRL and Semantic Parsing
Riccardo Orlando1, Simone Conia1, Stefano Faralli2, Roberto Navigli1
1Sapienza University of Rome, 2University of Rome Sapienza
12:30 - 13:00    Invited Local Talk - Philippe Boula de Mareüil - Auditorium
Chair: Blache, Philippe
13:00 - 14:30    Lunch Break
14:30 - 15:10    Keynote Speaker - Emmanuel Dupoux - Auditorium
Chair: Mariani, Joseph
Co-Chair: Piperidis, Stelios
15:10 - 15:15    Short Break (5mn)
15:15 - 16:35    Session O21: Corpora and Annotation (2) - Salle 120
Chair: Adda, Gilles
Co-Chair: Quochi, Valeria
15:15 - 15:35    D3: A Massive Dataset of Scholarly Metadata for Analyzing the State of Computer Science Research
Jan Philip Wahle1, Terry Ruas1, Saif Mohammad2, Bela Gipp1
1University of Wuppertal, 2NRC
15:35 - 15:55    SciPar: A Collection of Parallel Corpora from Scientific Abstracts
Dimitrios Roussis1, Vassilis Papavassiliou1, Prokopis Prokopidis1, Stelios Piperidis2, Vassilis Katsouros3
1ILSP/Athena RC, 2Athena RC/ILSP, 3Athena Research Center
15:55 - 16:15    CATs are Fuzzy PETs: A Corpus and Analysis of Potentially Euphemistic Terms
Martha Gavidia, Patrick Lee, Anna Feldman, JIng Peng
Montclair State University
16:15 - 16:35    Camel Treebank: An Open Multi-genre Arabic Dependency Treebank
Nizar Habash1, Muhammed AbuOdeh1, Dima Taji2, Reem Faraj3, Jamila El Gizuli4, Omar Kallas1
1New York University Abu Dhabi, 2Birzeit University, 3Columbia University, 4Georgia Institute of Technology
15:15 - 16:35    Session O22: Summarization - Salle 92
Chair: Tamburini, Fabio
Co-Chair: Monsen, Julius
15:15 - 15:35    MentSum: A Resource for Exploring Summarization of Mental Health Online Posts
Sajad Sotudeh, Nazli Goharian, Zachary Young
Georgetown University
15:35 - 15:55    Klexikon: A German Dataset for Joint Summarization and Simplification
Dennis Aumiller and Michael Gertz
Heidelberg University
15:55 - 16:15    Applying Automatic Text Summarization for Fake News Detection
Philipp Hartl and Udo Kruschwitz
University of Regensburg
15:15 - 16:35    Session O23: Language Resource Infrastructures and Standards - Auditorium
Chair: De Jong, Franciska
Co-Chair: Dobrovoljc, Kaja
15:15 - 15:35    Increasing CMDI’s Semantic Interoperability with schema.org
Nino Meisinger, Thorsten Trippel, Claus Zinn
University of Tübingen
15:35 - 15:55    RefCo and its Checker: Improving Language Documentation Corpora's Reusability Through a Semi-Automatic Review Process
Herbert Lange1 and Jocelyn Aznar2
1Universität Hamburg, 2ZAS
15:55 - 16:15    Identification and Analysis of Personification in Hungarian: The PerSECorp project
Gábor Simon
Eötvös Loránd University
16:15 - 16:35    ISO-based Annotated Multilingual Parallel Corpus for Discourse Markers
Purificação Silvano1, Mariana Damova2, Giedrė Oleškevičienė3, Chaya Liebeskind4, Christian Chiarcos5, Dimitar Trajanov6, Ciprian-Octavian Truică7, Elena-Simona Apostol7, Anna Baczkowska8
1University of Porto/ Centre of Linguistics of the University of Porto, 2Mozaika, Ltd., 3Mykolas Romeris University, 4Jerusalem College of Technology , Lev Academic Center, 5Goethe-Universität Frankfurt am Main, 6Department for Information Systems and Network Technologies, Faculty of Computer Science and Engineering Ss. Cyril and Methodius University, 7Uppsala University, 8UMK
15:15 - 16:35    Session O24: Multimodality and Cross-modality - La Major
Chair: Hajicova, Eva
Co-Chair: Gari Soler, Aina
15:15 - 15:35    LIP-RTVE: An Audiovisual Database for Continuous Spanish in the Wild
David Gimeno-Gómez and Carlos-D. Martínez-Hinarejos
PRHLT Research Center - Universitat Politècnica de València
15:35 - 15:55    Modality Alignment between Deep Representations for Effective Video-and-Language Learning
Hyeongu Yun, Yongil Kim, Kyomin Jung
Seoul National University
15:55 - 16:15    Mutual Gaze and Linguistic Repetition in a Multimodal Corpus
Anais Murat, Maria Koutsombogera, Carl Vogel
Trinity College Dublin
16:15 - 16:35    Multidimensional Coding of Multimodal Languaging in Multi-Party Settings
Christophe Parisse1, Marion Blondel2, Stéphanie Caët3, Claire Danet4, Coralie Vincent2, Aliyah Morgenstern5
1Modyco, 2SFL, 3STL & Lille University, 4Dylis, 5PRISMES
15:15 - 16:35    Session: P22 - Lexicons (2) - Poster Area 2
Chair: Yildiz, Olcay Taner
   Constructing a Lexical Resource of Russian Derivational Morphology
Lukáš Kyjánek1, Olga Lyashevskaya2, Anna Nedoluzhko3, Daniil Vodolazsky4, Zdeněk Žabokrtský5
1Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics, 2National Research University Higher School of Economics, 3Charles University in Prague, 4Sber, 5Charles University
   Using Linguistic Typology to Enrich Multilingual Lexicons: the Case of Lexical Gaps in Kinship
Temuulen Khishigsuren1, Gábor Bella2, Khuyagbaatar Batsuren1, Abed Alhakim Freihat2, Nandu Chandran Nair2, Amarsanaa Ganbold1, Hadi Khalilia2, Yamini Chandrashekar2, fausto giunchiglia3
1National University of Mongolia, 2University of Trento, 3Univesity of Trento
   Towards Latvian WordNet
Peteris Paikens1, Mikus Grasmanis1, Agute Klints1, Ilze Lokmane1, Lauma Pretkalniņa2, Laura Rituma2, Madara Stāde1, Laine Strankale1
1University of Latvia, IMCS, 2Institute of Mathematics and Computer Science, University of Latvia
   Building Sentiment Lexicons for Mainland Scandinavian Languages Using Machine Translation and Sentence Embeddings
Peng Liu, Cristina Marco, Jon Atle Gulla
Norwegian University of Science and Technology
   A Thesaurus-based Sentiment Lexicon for Danish: The Danish Sentiment Lexicon
Sanni Nimb1, Sussi Olsen2, Bolette Pedersen3, Thomas Troelsgård4
1Society for Danish Language and Literature (DSL), 2UCPH, Centre for Language Technology, 3University of Copenhagen, 4Society for Danish Language and Literature
   IndoUKC: A Concept-Centered Indian Multilingual Lexical Resource
Nandu Chandran Nair1, Rajendran Velayuthan2, Yamini Chandrashekar1, Gábor Bella1, fausto giunchiglia3
1University of Trento, 2Amrita Vishwa Vidyapeetham, 3Univesity of Trento
   Korean Language Modeling via Syntactic Guide
Hyeondey Kim1, Seonhoon Kim2, INHO KANG3, Nojun Kwak4, Pascale Fung5
1HKUST, 2Naver Search, 3NAVER Clova, 4Seoul National University, 5Hong Kong University of Science and Technology
   A Whole-Person Function Dictionary for the Mobility, Self-Care and Domestic Life Domains: a Seedset Expansion Approach
Ayah Zirikly1, Bart Desmet2, Julia Porcino2, Jonathan Camacho Maldonado2, Pei-Shu Ho2, Rafael Jimenez Silva2, Maryanne Sacco2
1Johns Hopkins University, 2National Institutes of Health
15:15 - 16:35    Session: P23 - Digital Humanities (1) - Poster Area 2
Chair: Wynne, Martin
   Placing multi-modal, and multi-lingual Data in the Humanities Domain on the Map: the Mythotopia Geo-tagged Corpus
Voula Giouli1, Anna Vacalopoulou1, Nikolaos Sidiropoulos1, Christina Flouda2, Athanasios Doupas1, Giorgos Giannopoulos3, Nikos Bikakis3, Vassilis Kaffes3, Gregory Stainhaouer1
1ATHENA Research & Innovation Centre, Institute for Language & Speech Processing, 2ATHENA Research & Innovation Centre. Institute for Language & Speech Processing, 3ATHENA Research & Innovation Centre, Information Management Systems Institute
   An Architecture of resolving a multiple link path in a standoff-style data format to enhance the mobility of language resources
Kazushi Ohya
Tsurumi University
   A Corpus of German Citizen Contributions in Mobility Planning: Supporting Evaluation Through Multidimensional Classification
Julia Romberg, Laura Mark, Tobias Escher
Department of Social Sciences, Heinrich Heine University Düsseldorf
   Overlooked Data in Typological Databases: What Grambank Teaches Us About Gaps in Grammars
Jakob Lesage1, Hannah Haynie2, Hedvig Skirgård3, Tobias Weber4, Alena Witzlack-Makarevich5
1Humboldt University Berlin, 2Department of Linguistics, University of Colorado Boulder, 3Department of Linguistic and Cultural Evolution, Max Planck Institute for Evolutionary Anthropology, 4Institute for Scandinavian Studies, Frisian and General Linguistics, Department of General Linguistics, Christian-Albrechts-Universität zu Kiel, 5Hebrew University of Jerusalem
   Hong Kong: Longitudinal and Synchronic Characterisations of Protest News between 1998 and 2020
Arya D. McCarthy1 and Giovanna Maria Dora Dore2
1Johns Hopkins University, 2JHU
   Nunc profana tractemus. Detecting Code-Switching in a Large Corpus of 16th Century Letters
Martin Volk, Lukas Fischer, Patricia Scheurer, Bernard Schroffenegger, Raphael Schwitter, Phillip Ströbel, Benjamin Suter
University of Zurich
15:15 - 16:35    Session: P24 - Evaluation and Validation Methodologies (2) - Poster Area 2
Chair: Zeldes, Amir
   Quality and Efficiency of Manual Annotation: Pre-annotation Bias
Marie Mikulová1, Milan Straka2, Jan Štěpánek1, Barbora Štěpánková1, Jan Hajic1
1Charles University, 2Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics
   A Comprehensive Evaluation and Correction of the TimeBank Corpus
Mustafa Ocal1, Antonela Radas2, Jared Hummer2, Karine Megerdoomian3, Mark Finlayson2
1Florida International University, 2FIU, 3The MITRE Corporation
   Evaluating Multilingual Sentence Representation Models in a Real Case Scenario
Rocco Tripodi1, Rexhina Blloshmi2, Simon Levis Sullam3
1Alma Mater Studiorum - Università di Bologna, 2Amazon Alexa, 3Ca' Foscari University of Venice
   Validity, Agreement, Consensuality and Annotated Data Quality
Anaëlle Baledent, Yann Mathet, Antoine Widlöcher, Christophe Couronne, Jean-Luc Manguin
Normandie Univ, UNICAEN, ENSICAEN, CNRS, GREYC, 14000 Caen, FRANCE
   Impact Analysis of the Use of Speech and Language Models Pretrained by Self-Supersivion for Spoken Language Understanding
salima mdhaffar1, Valentin Pelloin2, Antoine Caubrière3, Gaëlle Laperriere1, Sahar Ghannay4, Bassam Jabaian5, Nathalie Camelin6, Yannick Estève7
1LIA - University of Avignon, 2LIUM - Le Mans university, 3LIA, Avignon University, 4CNRS, LISN, 5CERI-LIA, University of Avignon, 6LIUM - University of Le Mans, 7LIA - Avignon University
   JGLUE: Japanese General Language Understanding Evaluation
Kentaro Kurihara1, Daisuke Kawahara1, Tomohide Shibata2
1Waseda University, 2Yahoo Japan Corporation
   Using the LARA Little Prince to compare human and TTS audio quality
Elham Akhlaghi1, Ingibjörg Auðunardóttir2, Anna Bączkowska3, Branislav Bédi4, Hakeem Beedar5, Harald Berthelsen6, Cathy Chua7, Catia Cucchiarin8, Hanieh Habibi9, Ivana Horváthová10, Junta Ikeda7, Christèle Maizonniaux11, Neasa Ní Chiaráin6, Chadi Raheb12, Manny Rayner13, John Sloan9, Nikos Tsourakis9, Chunlin Yao14
1Ferdowsi University of Mashhad, 2University of Iceland, 3University of Gdansk, Gdansk, 4The Árni Magnússon Institute for Icelandic Studies,, 5University of Adelaide, 6Trinity College, Dublin, 7Independent scholar, 8Centre for Language and Speech Technology (CLST), Radboud University Nijmegen, 9FTI/TIM, University of Geneva, 10Constantine the Philosopher University, 11Flinders University, Adelaide, 12University of Guilan, Rasht, 13Geneva University, 14Tianjin Chengjian University
   Cyberbullying Classifiers are Sensitive to Model-Agnostic Perturbations
Chris Emmery1, Ákos Kádár2, Grzegorz Chrupała3, Walter Daelemans4
1Tilburg University & University of Antwerp, 2Explosion, 3Tilburg University, 4University of Antwerp, CLiPS
   Constructing Distributions of Variation in Referring Expression Type from Corpora for Model Evaluation
T. Mark Ellison and Fahime Same
University of Cologne
   Knowledge Graph Question Answering Leaderboard: A Community Resource to Prevent a Replication Crisis
Aleksandr Perevalov1, Xi Yan2, Liubov Kovriguina3, Longquan Jiang2, Andreas Both1, Ricardo Usbeck2
1Anhalt University of Applied Sciences, 2Hamburg University, 3Fraunhofer IAIS
15:15 - 16:35    Session: P25 - Multilinguality and Machine Translation (2) - Poster Area 2
Chair: Kalamkar, Prathamesh
   Multi-Task Learning for Cross-Lingual Abstractive Summarization
Sho Takase and Naoaki Okazaki
Tokyo Institute of Technology
   How Much Context Span is Enough? Examining Context-Related Issues for Document-level MT
Sheila Castilho
Dublin City University
   TANDO: A Corpus for Document-level Machine Translation
Harritxu Gete1, Thierry Etchegoyhen1, David Ponce1, Gorka Labaka2, Nora Aranberri3, Ander Corral4, Xabier Saralegi4, Igor Ellakuria5, Maite Martin6
1Vicomtech, 2HiTZ Center - Ixa, University of the Basque Country (UPV/EHU), 3University of the Basque Country (UPV/EHU), 4Elhuyar Foundation, 5Isea, 6Ametzagaiña
   Unsupervised Machine Translation in Real-World Scenarios
Ona de Gibert Bonet1, Iakes Goenaga2, Jordi Armengol-Estapé1, Olatz Perez-de-Viñaspre3, Carla Parra Escartín4, Marina Sanchez5, Mārcis Pinnis6, Gorka Labaka7, Maite Melero1
1Barcelona Supercomputing Center, 2UPV/EHU, 3HiTZ Center - Ixa, University of the Basque Country UPV/EHU, 4RWS Language Weaver, 5Unbabel, 6Tilde, 7HiTZ Center - Ixa, University of the Basque Country (UPV/EHU)
   COVID-19 Mythbusters in World Languages
Mana Ashida1, Jin-Dong Kim2, Lee Seunghun3
1Yahoo Japan Corporation, 2Database Center for Life Science, 3International Christian University
   On the Multilingual Capabilities of Very Large-Scale English Language Models
Jordi Armengol-Estapé, Ona de Gibert Bonet, Maite Melero
Barcelona Supercomputing Center
   Evaluating Subtitle Segmentation for End-to-end Generation Systems
Alina Karakanta1, François Buet2, Mauro Cettolo3, François Yvon4
1Fondazione Bruno Kessler (FBK), University of Trento, 2Laboratoire Interdisciplinaire des Sciences du Numérique, 3Fondazione Bruno Kessler, 4LISN CNRS & Univ. Paris Saclay
   Using Semantic Role Labeling to Improve Neural Machine Translation
Reinhard Rapp
Athena R.C.
   A Deep Transfer Learning Method for Cross-Lingual Natural Language Inference
Dibyanayan Bandyopadhyay1, Arkadipta De2, Baban Gain1, Tanik Saikh3, Asif Ekbal4
1Indian Institute of Technology, Patna, 2Indian Institute of Technology Hyderabad, 3India Institute of Technology Patna, 4IIT Patna
   Simple TICO-19: A Dataset for Joint Translation and Simplification of COVID-19 Texts
Matthew Shardlow1 and Fernando Alva-Manchego2
1Manchester Metropolitan University, 2Cardiff University
   Building Comparable Corpora for Assessing Multi-Word Term Alignment
Omar Adjali1, Emmanuel Morin2, Pierre Zweigenbaum3
1Université Paris-Saclay, CNRS, Laboratoire Interdisciplinaire des Sciences du Numérique, 2LS2N UMR CNRS 6004, 3LISN, CNRS, Université Paris-Saclay
   Mean Machine Translations: On Gender Bias in Icelandic Machine Translations
Agnes Sólmundsdóttir, Dagbjört Guðmundsdóttir, Lilja Stefánsdóttir, Anton Ingason
University of Iceland
15:15 - 16:35    Session: P26 - Dialogue and Conversational Systems (2) - Poster Area 2
Chair: Hartholt, Arno
   An Analysis of Dialogue Act Sequence Similarity Across Multiple Domains
Ayesha Enayet and Gita Sukthankar
University of Central Florida
   Constructing a Culinary Interview Dialogue Corpus with Video Conferencing Tool
Taro Okahisa, Ribeka Tanaka, Takashi Kodama, Yin Jou Huang, Sadao Kurohashi
Kyoto University
   UgChDial: A Uyghur Chat-based Dialogue Corpus for Response Space Classification
Zulipiye Yusupujiang1 and Jonathan Ginzburg2
1Université Paris Cité, 2Université de Paris
   A Speculative and Tentative Common Ground Handling for Efficient Composition of Uncertain Dialogue
Saki Sudo1, Kyoshiro Asano1, Koh Mitsuda2, Ryuichiro Higashinaka3, Yugo Takeuchi1
1Shizuoka University, 2NTT, 3Nagoya University/NTT
   BaSCo: An Annotated Basque-Spanish Code-Switching Corpus for Natural Language Understanding
Maia Aguirre, Laura García-Sardiña, Manex Serras, Ariane Méndez, Jacobo López
Vicomtech
   ProDial -- An Annotated Proactive Dialogue Act Corpus for Conversational Assistants using Crowdsourcing
Matthias Kraus1, Nicolas Wagner2, Wolfgang Minker2
1University of Ulm, 2Ulm University
   ELITR Minuting Corpus: A Novel Dataset for Automatic Minuting from Multi-Party Meetings in English and Czech
Anna Nedoluzhko1, Muskaan Singh2, Marie Hledíková3, Tirthankar Ghosal4, Ondřej Bojar3
1Charles University in Prague, 2UFAL,Charles University, 3Charles University, MFF UFAL, 4Institute of Formal and Applied Linguistics, Charles University
16:35 - 16:55    Coffee Break
16:55 - 18:35    Session O25: Social Media Processing - Auditorium
Chair: Rayson, Paul
Co-Chair: Aldabe, Itziar
16:55 - 17:15    Extracting Age-Related Stereotypes from Social Media Texts
Kathleen C. Fraser, Svetlana Kiritchenko, Isar Nejadgholi
National Research Council Canada
17:15 - 17:35    Borrowing or Codeswitching? Annotating for Finer-Grained Distinctions in Language Mixing
Elena Alvarez-Mellado1 and Constantine Lignos2
1UNED School of Computer Science, 2Brandeis University
17:35 - 17:55    Multi-Aspect Transfer Learning for Detecting Low Resource Mental Disorders on Social Media
Ana Sabina Uban1, Berta Chulvi2, Paolo Rosso2
1Universitat Politecnica de Valencia, University of Bucharest, 2Universitat Politècnica de València
17:55 - 18:15    ArCovidVac: Analyzing Arabic Tweets About COVID-19 Vaccination
Hamdy Mubarak1, Sabit Hassan2, Shammur Absar Chowdhury1, Firoj Alam3
1Qatar Computing Research Institute, 2University of Pittsburgh, 3Qatar Computing Research Institute, HBKU
18:15 - 18:35    FACTOID: A New Dataset for Identifying Misinformation Spreaders and Political Bias
Flora Sakketou1, Joan Plepi1, Riccardo Cervero2, Henri Geiss3, Paolo Rosso2, Lucie Flek1
1Philipps-Marburg University, 2Universitat Politècnica de València, 3Technical University of Darmstadt
16:55 - 18:35    Session O26: Speech Resources and Processing - La Major
Chair: Lindén, Krister
Co-Chair: Mazzocconi, Chiara
16:55 - 17:15    Multitask Learning for Grapheme-to-Phoneme Conversion of Anglicisms in German Speech Recognition
Julia Pritzen1, Michael Gref2, Dietlind Zühlke3, Christoph Schmidt2
1Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS) & TH Köln - University of Applied Sciences, 2Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), 3TH Köln - Cologne University of Applied Sciences
17:15 - 17:35    SDS-200: A Swiss German Speech to Standard German Text Corpus
Michel Plüss1, Manuela Hürlimann2, Marc Cuny3, Alla Stöckli3, Nikolaos Kapotis4, Julia Hartmann5, Malgorzata Anna Ulasik6, Christian Scheller7, Yanick Schraner7, Amit Jain4, Jan Deriu8, Mark Cieliebak8, Manfred Vogel7
1University of Applied Sciences and Arts Northwestern Switzerland, 2Zurich University of Applied Sciences (ZHAW), 3SpinningBytes AG, 4-, 5FHNW, 6ZHAW, 7University of Applied Sciences Northwestern Switzerland, 8Zurich University of Applied Sciences
17:35 - 17:55    Extracting Linguistic Knowledge from Speech: A Study of Stop Realization in 5 Romance Languages
Yaru WU1, Mathilde Hutin2, Ioana Vasilescu3, Lori Lamel4, Martine Adda-Decker5
1CRISCO/EA4255, Université de Caen Normandie, 14000 Caen, France; Laboratoire de Phonétique et Phonologie (UMR7018, CNRS-Sorbonne Nouvelle), France, 2Université Paris-Saclay, CNRS, LIMSI, 3LIMSI-CNRS, 4CNRS/LIMSI, 5LPP (Lab. Phonétique & Phonologie) / LIMSI-CNRS
17:55 - 18:15    Overlaps and Gender Analysis in the Context of Broadcast Media
Martin Lebourdais1, Marie Tahon2, Antoine LAURENT3, Sylvain Meignier1, Anthony Larcher4
1LIUM, 2LIUM / Le Mans University, 3LIUM - Laboratoire Informatique Université du Mans, 4Université du Mans - LIUM
18:15 - 18:35    A Semi-Automatic Approach to Create Large Gender- and Age-Balanced Speaker Corpora: Usefulness of Speaker Diarization & Identification.
Rémi Uro1, David Doukhan1, Albert Rilliard2, Laetitia Larcher1, Anissa-Claire Adgharouamane1, Marie Tahon3, Antoine Laurent3
1Institut National de l'Audiovisuel, 2Université Paris Saclay, CNRS, LISN, 3LIUM, Le Mans Université
16:55 - 18:35    Session O27: Discourse - Salle 120
Chair: Cabrio, Elena
Co-Chair: Abercrombie, Gavin
16:55 - 17:15    DiscoGeM: A Crowdsourced Corpus of Genre-Mixed Implicit Discourse Relations
Merel Scholman, Tianai Dong, Frances Yung, Vera Demberg
Saarland University
17:15 - 17:35    QT30: A Corpus of Argument and Conflict in Broadcast Debate
Annette Hautli-Janisz1, Zlata Kikteva1, Wassiliki Siskou2, Kamila Gorska3, Ray Becker3, Chris Reed3
1University of Passau, 2University of Konstanz, 3University of Dundee
17:35 - 17:55    Scaling up Discourse Quality Annotation for Political Science
Neele Falk1 and Gabriella Lapesa2
1University of Stuttgart, 2Universität Stuttgart, Institut für Maschinelle Sprachverarbeitung
17:55 - 18:15    Clarifying Implicit and Underspecified Phrases in Instructional Text
Talita Anthonio, Anna Sauer, Michael Roth
University of Stuttgart
18:15 - 18:35    Multilingual Pragmaticon: Database of Discourse Formulae
Anton Buzanov, Polina Bychkova, Arina Molchanova, Anna Postnikova, Daria Ryzhova
Higher School of Economics
16:55 - 18:35    Session O28: Digital Humanities and Cultural Heritage - Salle 92
Chair: Witt, Andreas
Co-Chair: Frenda, Simona
16:55 - 17:15    Distant Reading in Digital Humanities: Case Study on the Serbian Part of the ELTeC Collection
Ranka Stanković1, Cvetana Krstev2, Branislava Šandrih Todorović3, Dusko Vitas4, Mihailo Skoric5, Milica Ikonić Nešić2
1University of Belgrade - Faculty of Mining and Geology, 2University of Belgrade, Faculty of Philology, 3University of Belgrade, Faculty of Philology, Serbia, 4Professor, 5University of Belgrade Faculty of Mining and Geology
17:15 - 17:35    Exploring Text Recombination for Automatic Narrative Level Detection
Nils Reiter1, Judith Sieker2, Svenja Guhr3, Evelyn Gius3, Sina Zarrieß2
1University of Cologne, 2University of Bielefeld, 3Technical University of Darmstadt
17:35 - 17:55    Automatic Normalisation of Early Modern French
Rachel Bawden1, Jonathan Poinhos2, Eleni Kogkitsidou2, Philippe Gambette3, Benoît Sagot1, Simon Gabay4
1Inria, 2LIGM (UMR 8049), Université Gustave Eiffel, CNRS, 3LIGM, Université Gustave Eiffel, CNRS, 4Université de Genève
17:55 - 18:15    From FreEM to D’AlemBERT: a Large Corpus and a Language Model for Early Modern French
Simon Gabay1, Pedro Ortiz Suarez2, Alexandre BARTZ3, Alix Chagué4, Rachel Bawden5, Philippe Gambette6, Benoît Sagot5
1Université de Genève, 2Data and Web Science Group, University of Mannheim, 3Sorbonne Université, 4Inria/Université de Montréal, 5Inria, 6LIGM, Université Gustave Eiffel, CNRS
18:15 - 18:35    Detecting Multiple Transitions in Literary Texts
Nuette Heyns1 and Menno van Zaanen2
1North West University, 2South African Centre for Digital Language Resources
16:55 - 18:35    Session: P27 - Corpora and Annotation (4) - Poster Area 1
Chair: Pęzik, Piotr  
   BasqueParl: A Bilingual Corpus of Basque Parliamentary Transcriptions
Nayla Escribano1, Jon Ander Gonzalez1, Julen Orbegozo-Terradillos2, Ainara Larrondo-Ureta2, Simón Peña-Fernández2, Olatz Perez-de-Viñaspre1, Rodrigo Agerri1
1HiTZ Center - Ixa, University of the Basque Country UPV/EHU, 2Gureiker, University of the Basque Country UPV/EHU
   GerEO: A Large-Scale Resource on the Syntactic Distribution of German Experiencer-Object Verbs
Johanna M. Poppek, Simon Masloch, Tibor Kiss
Linguistic Data Science Lab, Ruhr-Universitaet Bochum
   ACT2: A multi-disciplinary semi-structured dataset for importance and purpose classification of citations
Suchetha Nambanoor Kunnath1, Valentin Stauber2, Ronin Wu2, David Pride3, Viktor Botev2, Petr Knoth3
1Open University, 2Iris.ai, 3The Open University
   Quantification Annotation in ISO 24617-12, Second Draft
Harry Bunt1, Maxime Amblard2, Johan Bos3, Karën Fort4, Bruno Guillaume5, Philippe de Groote6, Chuyuan Li7, Pierre Ludmann2, Michel Musiol8, Siyana Pavlova9, Guy Perrier10, Sylvain Pogodalla11
1Tilburg University, 2Université de Lorraine, CNRS, Inria, LORIA, 3University of Groningen, 4Sorbonne Université and LORIA, 5LORIA / Inria Nancy Grand-Est, 6Inria, 7LORIA, 8INRIA Sémagrame & CNRS Atilf UMR 7118, 9Université de Lorraine, 10LORIA - University of Lorraine, 11LORIA/INRIA Nancy-Grand Est
   The LTRC Hindi-Telugu Parallel Corpus
Vandan Mujadia1 and Dipti Sharma2
1student, 2IIIT, Hyderabad
   MHE: Code-Mixed Corpora for Similar Language Identification
Priya Rani1, John P. McCrae2, Theodorus Fransen3
1Data Science Institute, National University of Ireland, 2Insight Center for Data Analytics, National University of Ireland Galway, 3Data Science Institute, Insight Centre for Data Analytics, National University of Ireland, Galway
   Bazinga! A Dataset for Multi-Party Dialogues Structuring
Paul Lerner1, Juliette Bergoënd1, Camille Guinaudeau2, Hervé Bredin3, Benjamin Maurice1, Sharleyne Lefevre1, Martin Bouteiller1, Aman Berhe1, Léo Galmant1, Ruiqing Yin1, Claude Barras4
1Université Paris-Saclay, CNRS, LISN, 2University Paris Saclay / LISN - CNRS, 3CNRS, 4Vocapia Research
   The Ellogon Web Annotation Tool: Annotating Moral Values and Arguments
Alexandros Ntogramatzis1, Anna Gradou2, Georgios Petasis2, Marko Kokol3
1Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, 2NCSR "Demokritos", 3Semantika Research
   WeCanTalk: A New Multi-language, Multi-modal Resource for Speaker Recognition
Karen Jones1, Kevin Walker2, Christopher Caruso2, Jonathan Wright3, Stephanie Strassel4
1Linguistic Data Consortium, 2Linguistic Data Consortium/University of Pennsylvania, 3University of Pennsylvania, 4Linguistic Data Consortium, University of Pennsylvania
   Using Wiktionary to Create Specialized Lexical Resources and Datasets
Lenka Bajčetić1 and Thierry Declerck2
1Austrian Centre for Digital Humanities and Cultural Heritage, Austrian Academy of Sciences, 2DFKI GmbH
   STAPI: An Automatic Scraper for Extracting Iterative Title-Text Structure from Web Documents
Nan Zhang1, Shomir Wilson2, Prasenjit Mitra1
1The Pennsylvania State University, 2Pennsylvania State University
   ELTE Poetry Corpus: A Machine Annotated Database of Canonical Hungarian Poetry
Péter Horváth1, Péter Kundráth2, Balázs Indig1, Zsófia Fellegi3, Eszter Szlávich1, Tímea Bajzát3, Zsófia Sárközi-Lindner1, Bence Vida1, Aslihan Karabulut1, Mária Timári1, Gábor Palkó1
1Eötvös Loránd University, 2no institution, 3Research Centre for the Humanities
   HAWP: a Dataset for Hindi Arithmetic Word Problem Solving
Harshita Sharma1, Pruthwik Mishra2, Dipti Sharma2
1IIIT Hyderabad, 2IIIT, Hyderabad
   The Bulgarian Event Corpus: Overview and Initial NER Experiments
Petya Osenova1, Kiril Simov2, Iva Marinova3, Melania Berbatova4
1Sofia University "St. Kl. Ohridski" and IICT-BAS, 2Artificial Intelligence and Language Technologies Department, IICT, Bulgarian Academy of Sciences, 3Identrics, 4IICT-BAS
   A Corpus for Commonsense Inference in Story Cloze Test
Bingsheng Yao, Ethan Joseph, Julian Lioanag, Mei Si
Rensselaer Polytechnic Institute
16:55 - 18:35    Session: P28 - Natural Language Generation (including Summarization) (2) - Poster Area 1
Chair: Paroubek, Patrick
   Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish
Ariel Ekgren1, Amaru Cuba Gyllensten2, Evangelia Gogoulou2, Alice Heiman1, Severine Verlinden1, Joey Öhman1, Fredrik Carlsson3, Magnus Sahlgren1
1AI Sweden, 2RISE, 3Research Institute of Sweden
   Constrained Language Models for Interactive Poem Generation
Andrei Popescu-Belis1, Àlex Atrio2, Valentin Minder1, Aris Xanthos3, Gabriel Luthier1, Simon Mattei1, Antonio Rodriguez3
1HEIG-VD / HES-SO, 2HEIG-VD / HES-SO & EPFL, 3University of Lausanne
   ELF22: A Context-based Counter Trolling Dataset to Combat Internet Trolls
Huije Lee1, Young Ju NA2, Hoyun Song3, Jisu Shin3, Jong Park3
1Korea Advanced Institute of Science and Technology (KAIST), 2Universit´e Sorbonne Nouvelle, 3KAIST
   Generating Textual Explanations for Machine Learning Models Performance: A Table-to-Text Task
Isaac Ampomah1, James Burton1, Amir Enshaei2, Noura Al Moubayed1
1Durham University, 2Caspian Learning Ltd, Newcastle University
   Barch: an English Dataset of Bar Chart Summaries
Iza Škrjanec1, Muhammad Salman Edhi2, Vera Demberg3
1University of Saarland, 2Universität des Saarlandes, 3Saarland University
   Effectiveness of Data Augmentation and Pretraining for Improving Neural Headline Generation in Low-Resource Settings
Matej Martinc1, Syrielle Montariol2, Lidia Pivovarova3, Elaine Zosa3
1Jozef Stefan Institute, 2INRIA, 3University of Helsinki
   Effectiveness of French Language Models on Abstractive Dialogue Summarization Task
Yongxin Zhou1, François Portet2, Fabien Ringeval3
1Université Grenoble Alpes, LIG, 2Univ Grenoble Alpes, Laboratoire d'Informatique de Grenoble, 3University of Grenoble Alpes
   ALEXSIS: A Dataset for Lexical Simplification in Spanish
Daniel Ferrés and Horacio Saggion
Universitat Pompeu Fabra
16:55 - 18:35    Session: P29 - Information Extraction (2) - Poster Area 1
Chair: Névéol, Aurélie
   The IARPA BETTER Program Abstract Task Four New Semantically Annotated Corpora from IARPA’s BETTER Program
Timothy Mckinnon and Carl Rubino
IARPA
   A Named Entity Recognition Corpus for Vietnamese Biomedical Texts to Support Tuberculosis Treatment
Uyen Phan1, Phuong Nguyen2, Nhung Nguyen3
1VNUHCM-University of Science, 2Pham Ngoc Thach University of Medicine, 3The University of Manchester
   RaFoLa: A Rationale-Annotated Corpus for Detecting Indicators of Forced Labour
Erick Mendez Guzman1, Viktor Schlegel2, Riza Batista-Navarro3
1The University of Manchester, 2University of Manchester, 3Department of Computer Science, The University of Manchester
   Wojood: Nested Arabic Named Entity Corpus and Recognition using BERT
Mustafa Jarrar1, Mohammed Khalilia2, Sana Ghanem1
1Birzeit University, 2Amazon
   Cross-lingual Approaches for the Detection of Adverse Drug Reactions in German from a Patient's Perspective
Lisa Raithel1, Philippe Thomas2, Roland Roller3, Oliver Sapina3, Sebastian Möller4, Pierre Zweigenbaum5
1LISN, CNRS, Université Paris Saclay / DFKI Berlin, Technische Universität Berlin, 2German Research Center for Artificial Intelligence, 3DFKI LT Lab, 4Quality and Usability Lab, TU Berlin, 5LISN, CNRS, Université Paris-Saclay
   GGPONC 2.0 - The German Clinical Guideline Corpus for Oncology: Curation Workflow, Annotation Policy, Baseline NER Taggers
Florian Borchert1, Christina Lohr2, Luise Modersohn3, Jonas Witt1, Thomas Langer4, Markus Follmann4, Matthias Gietzelt5, Bert Arnrich1, Udo Hahn2, Matthieu-P. Schapranow1
1Digital Health Center, Hasso Plattner Institute, 2Friedrich-Schiller-Universität Jena, 3Friedrich Schiller University Jena, 4German Guideline Program in Oncology, German Cancer Society, 5Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School
   ClinIDMap: Towards a Clinical IDs Mapping for Data Interoperability
Elena Zotova1, Montse Cuadros1, German Rigau2
1Vicomtech, 2UPV/EHU
   Identifying Draft Bills Impacting Existing Legislation: a Case Study on Romanian
Corina Ceausu1 and Sergiu Nisioi2
1University of Bucharest, 2Human Language Technologies Research Center, University of Bucharest
   MuLD: The Multitask Long Document Benchmark
George Hudson and Noura Al Moubayed
Durham University
   A Cross-document Coreference Dataset for Longitudinal Tracking across Radiology Reports
Surabhi Datta, Hio Lam, Atieh Pajouhi, Sunitha Mogalla, Kirk Roberts
University of Texas Health Science Center at Houston
   How's Business Going Worldwide ? A Multilingual Annotated Corpus for Business Relation Extraction
Hadjer Khaldi1, Farah Benamara2, Camille Pradel3, Grégoire Sigel3, Nathalie Aussenac-Gilles4
1IRIT - University of Paul Sabatier/ Geotrend, 2University of toulouse, 3Geotrend, 4CNRS - IRIT
   Do Transformer Networks Improve the Discovery of Rules from Text?
Mahdi Rahimi and Mihai Surdeanu
University of Arizona
   Offensive language detection in Hebrew: can other languages help?
Marina Litvak1, Natalia Vanetik1, Chaya Liebeskind2, Omar Hmdia1, Rizek Madeghem1
1Shamoon College of Engineering, 2Jerusalem College of Technology , Lev Academic Center
   JaMIE: A Pipeline Japanese Medical Information Extraction System with Novel Relation Annotation
Fei Cheng1, Shuntaro Yada2, Ribeka Tanaka3, Eiji ARAMAKI4, Sadao Kurohashi1
1Kyoto University, 2Nara Institute of Science and Technology, 3Ochanomizu University, 4NAIST, Japan
   Enhanced Entity Annotations for Multilingual Corpora
Michael Strobl1, Amine Trabelsi2, Osmar Zaïane1
1University of Alberta, 2Lakehead University
   Enriching Epidemiological Thematic Features For Disease Surveillance Corpora Classification
Edmond Menya1, Mathieu Roche2, Roberto Interdonato2, Dickson Owuor1
1Strathmore University, 2CIRAD
   Spanish Datasets for Sensitive Entity Detection in the Legal Domain
Ona de Gibert Bonet1, Aitor García Pablos2, Montse Cuadros2, Maite Melero1
1Barcelona Supercomputing Center, 2Vicomtech
   ConvTextTM: An Explainable Convolutional Tsetlin Machine Framework for Text Classification
Bimal Bhattarai1, Ole-Christoffer Granmo2, Lei Jiao1
1University of Agder, 2Centre for Artificial Intelligence Research
   Elvis vs. M. Jackson: Who has More Albums? Classification and Identification of Elements in Comparative Questions
Meriem Beloucif1, Seid Muhie Yimam2, Steffen Stahlhacke2, Chris Biemann2
1University of Hamburg, 2Universität Hamburg
   Decorate the Examples: A Simple Method of Prompt Design for Biomedical Relation Extraction
Hui-Syuan Yeh1, Thomas Lavergne1, Pierre Zweigenbaum2
1LISN/CNRS & Université Paris Saclay, 2LISN, CNRS, Université Paris-Saclay
   Comparing Annotated Datasets for Named Entity Recognition in English Literature
Rositsa Ivanova1, Marieke van Erp2, Sabrina Kirrane1
1Vienna University for Economics and Business, 2KNAW Humanities Cluster
                         End of Day 2