The Industry Track took place on Wednesday 22 June, in the morning, from 9:30 to 12:30.
All presentations are available in the programme below.
Follow the presentations on Zoom: https://us02web.zoom.us/j/87144570196
start time | end time | Organization | Presenter | Title | Description |
09:30 | 09:40 | ELRA | Bente Maegaard and Khalid Choukri | Introduction to the Industry Track | |
09:40 | 10:00 | European Commission | Philippe Gelin | Language Data Space | |
10:00 | 10:20 | Daan van Esch | Language Technology Inclusivity at Google and Beyond | Language technology has historically only been available in a small number of the world's languages, and while this remains the case today, exciting research advances are already starting to change the picture significantly: for example, Google Translate recently launched 24 new languages without any parallel data whatsoever, distilled from a 1000-language massively multilingual machine translation model. This talk will touch upon some of the most promising recent trends, and outline what the next couple of years may look like in terms of making language technology more inclusive. | |
10:20 | 10:30 | CJK | Jack Halpern | ArabLEX: Arabic Full-Form Lexicon with 530 million entries |
The most comprehensive Arabic computational full form lexicon ever created, covering over 530 million inflected, conjugated, declined, and cliticized wordforms. Ideal for NLP applications like MT, NER and morphological analysis and especially for speech technology, such as training ASR and TTS models. |
10:30 | 10:40 | Vicomtech | Thierry Etchegoyhen | From Under-resourced to Large-scale Industrial Deployment: Machine Translation of Basque | The tecnological support for the Basque language can still be described as weak or fragmentary overall. In recent years, significant efforts have been made to provide high-quality machine translation for this language. We describe the main steps that have led to large-scale industrial deployments of machine translation services that are having a significant impact on the digital presence of Basque. |
10:40 | 10:50 | ChapsVision | Sophie Ulrich | ||
10:50 | 11:10 | Coffee Break | |||
11:10 | 11:30 | Amazon | Jimmy Kunzman | LREC 2022 Marseille Select On-device Spoken Langage Understanding Topics | Applied research on compute-constraint, on-device spoken language understanding systems raises lots of interest for enabling an Alexa experience when there is no internet connectivity or when serving the request locally improves the Alexa experience for our customers in their homes or cars. We will briefly touch some topics like dynamic adaptation and personalization, tightly integrated speech understanding and small footprint ASR systems in the context of rapidly evolving neural speech processing technologies. |
11:30 | 11:40 | CEA | Dalila Guessoum | Application of NLP to cosmetic | Introduction to natural language processing technologies of CEA list and domain adaptation for Cosmetic ingredients toxicological information extraction from scientific documentations |
11:40 | 11:50 | Emvista | Cédric Lopez | Dealing with meaning representations | Emvista is a software editor. Created in Montpellier in 2018 and backed by its R&D team in Natural Language Processing, the company offers innovative products that are based on state-of-the-art technologies. Its flagship product, an email management assistant, allows its users to be more efficient in managing their e-mails by identifying all the relevant information in the mass of information received (for example the requests for information and actions) and many other features such as automatic email forwarding or classification. Emvista proposes many other text analysis services such as opinion and emotions analysis, the anonymization of sensitive texts, and the extraction of entities. All of this is based on a technology that structures text contents using a meaning representation which will be presented during this talk. |
11:50 | 12:00 | Vocapia | Claude Barras | Towards adaptive, multi-domain speech transcription systems | Recent advances in DNN models allow for more flexible solutions in automatic speech transcription and we will share some feedbacks from Vocapia Research on the topic. Cross-domain capacities, covering telephone conversations, broadcast speech, podcasts or more noisy situations can now be reached through a global model. Another area of progress is the growing capacity for users with few linguistic expertise to adapt pre-trained models to their specific needs and applications. Finally, the challenge of addressing new languages and dialects, often low-resourced and impacted by code-switching, can be handled thanks to universal multilingual phonetic models and an adaptation more parsimonious into annotated data. Nevertheless, relevant linguistic data remains as before the key to any successful project in the field. |
12:00 | 12:10 | Cerence | Rainer Gruhn | AI for a World in Motion |
Cerence provides conversational AI for automotive and mobility industries. After a brief introduction of the company, this presentation will discuss the challenges of creating motorcycle driver databases to enable speech control of car navigation systems by two-wheeler drivers. We will look at Bluetooth headset and helmet-induced distortions.
|
12:10 | 12:20 | Orange | Lina Rojas | Research in NLP at Orange |