Latest News
- New LRs in the ELRA Catalogue July 25, 2024
- New LRs in the ELRA Catalogue June 5, 2024
- New LRs in the ELRA Catalogue Dec. 7, 2023
- New LRs in the ELRA Catalogue Nov. 13, 2023
- The LDS vision by Philippe Gelin Oct. 17, 2023
The European initiative COVID-19 MLIA improves the Multilingual Information Access
Information access at the time of COVID-19 pandemics is hampered by the amount and the reliability of information, as well as the many languages in which the information is provided. Language technologies can help.
The COVID-19 MLIA initiative, endorsed by the European Commission's DG CNECT and coordinated by the University of Padua and ELRA/ELDA, has been launched in June 2020 to improve Multilingual Information Access in this specific context.
The 1st round of evaluation has been completed in early 2021. The initiative has triggered a large interest: 14 teams from 10 countries actually submitted runs for the 3 tasks. Many more teams had registered and are expected to join for round 2 and 3.
Within the Data Acquisition task, the collection was done in 2 parts.
For Machine Translation, the parallel data was built from well-known web sources in the domain of Health and Medicine, and enriched with identified COVID-19 dataset. The size of the resulting corpora ranges from 810K to 1.1M sentence pairs depending on the language pairs (English to German, French, Spanish, Italian, Modern Greek and Swedish). The processed language resources have been cleared and will be progressively made available as an evaluation package from the ELRC-Share repository.
For the Information Extraction and Multilingual Semantic Search, the Europe Media Monitoring (EMM) system developed by the European Commission Joint Research Centre (JRC) was used and tuned to collect metadata automatically extracted from news articles related to Covid-19. This set of metadata is available as the 2020 Medisys COVID-19 Dataset on the Open Data Portal."
The second round will run on March and April 2021. For this round, the initiative is also looking into adding new topics, improving the language coverage by extending the number of less-resourced EU languages and fostering cross-fertilization between the tasks.
Finally, it can be noted that similar initiatives tackling the information access issue are being conducted throughout the world. CORD-19 dataset, a collection of health-related literature, from biomedical data sources with the support of WHO (World Health Organization) is one of them. TREC-COVID evaluation of search systems run by NIST (National Institute of Standards and Technology) and using the CORD-19 document is another one.
COVID-19 MLIA is supported by:
More information can be found here: COVID-MLIA and there COVID-19 MLIA Youtube Channel.
All the resources produced during the evaluation rounds are available on the git repositories of the initiative, under CC-BY-SA 4.0 license.
(This article was originally published in the 19th ELRC newsletter in February 2021.)