RSS twitter Login
Home Contact Login


Share this page!
twitter google-plus linkedin share

The EVALDA project has been financed by the French Ministry of Research in the context of its Technolangue programme. The aim of the project was to establish a permanent evaluation infrastructure for the language engineering sector in France and for the French language.

The aim of such a project was to put together reuseable components such as organisation, logistics, language resources, evaluation protocols, methodologies and metrics as well as major actors in the field (scientific advisory boards, panels of experts, partners etc). This guaranteed the possibility to capitalise on the results of previous experiments, but also to favour collaborative research and the setting up of new and improved evaluation campaigns. It was imperative that the evaluations envisaged in this project could be reproduced by third parties, using the resources assembled over the course of the project, in order to enable a genuine comparison of system performance and benchmarking of the state in the art of language engineering. All evaluation resources have been made available on the ELRA catalogue at the end of the project in the form of an evaluation package.

A second aim of the project was to set up evaluation campaigns involving several linguistic technologies including both written and spoken media. Industrial and academic partners took part in the project. The campaigns were largely based around black box evaluation protocols and quantitative methods, drawing and expanding upon previous evaluation campaigns, such as ARC-AUPELF, GRACE, TREC etc.

Each evaluation campaign was largely independent, however a certain amount of synergy between the campaigns was envisaged. This involved the sharing of know-how, resources or even personnel.

The choice of linguistic technologies to evaluate was made on the basis of those that appeared to be the most crucial or important in the field. Details on the selected projects are provided in the notebook below, along with the link to the corresponding Evaluation Package in the catalogue.

Action de Recherche Concertée sur l’Alignement de Documents et son Evaluation

Evaluation of bilingual text and vocabulary alignment systems. Following the success of ARCADEI, this follow up campaign aims to evaluate alignments between more distant or ’exotic’ languages ie Greek, Russian, Japanse, Chinese.

ARCADE Evaluation Package


The ARCADE project, started in 1995 and achieved in 1999, was designed to provide standard methods for the evaluation and comparison of French-English parallel text alignment systems. The ARCADE II aims at exploring the techniques of multilingual text alignment through a fine evaluation of the existing techniques and the development of new alignment methods.

ARCADE II consists of two tracks devoted to the evaluation of alignment at sentence and word level respectively. It differs from previous ARCADE in the multilingual aspect and the investigation of lexical alignment. The concerned languages include 5 European languages (English, French, German, Italian and Spanish) and 6 languages of different writing systems (Arabic, Russian, Chinese, Japanese, Greek and Persian). Multilingual reference corpora have been made available for the evaluation exercise.

Contact : Khalid Choukri -

For more information (in French), please visit

Méthodologie d’Evaluation automatique de la compréhension hors et en contexte du DIAlogue

Evaluation of Man-Machine dialogue systems. In this case, the task of hotel room reservation (including some local touristic information) is envisaged.

MEDIA Evaluation Package & MEDIA Speech Database for French


The aim of the MEDIA evaluation campaign is to test an automatic evaluation methodology for man-machine dialogue systems. The evaluation methodology is based on a paradigm that uses test sets taken from a corpus of real-world dialogues, a semantic representation of dialogue and common evaluation metrics. This protocol is designed to test the capacity of dialogue systems, both taking into account and not taking into account, the context of the dialogue.

In order to validate the evaluation protocol and the semantic representations, an evaluation campaign will take place where each partner in the project tests their system. The task chosen is hotel room reservation, with touristic information as an additional point of entry into the dialogue.

The final Media Workshop took place at the Sainte-Marthe University in Avignon, France, on July 6-7 2006.

Contact : Khalid Choukri -

For more information (in French), please visit

Campagne d’Evaluation de Systèmes de Traduction Automatique

Evaluation of Machine Translation Systems. French is to be the pivotal language, however, several languages from and into French are envisaged (English, Spanish, German, Arabic) according to the capabilities of the participants’ systems.

CESTA Evaluation Package


Final CESTA Report
(in French, pdf, 99 pages, 1028Ko)

The CESTA campaign proposes a series of evaluation campaigns of machine translation systems for various language pairs towards French. The statistical metrics BLEU/NIST (IBM) are being used for the evaluations and adapted to French as a target language, along with other automatic metrics based on grammatical and semantic scores (X-Score and D-Score). The Weighted N-gram Model (WNM), WER and PER are also used. The other aim of CESTA is to conduct a meta-evaluation, comparing the automatic results with human judgments.




-  Université de Lille 3, IDIS/CESARTES
-  Ecole Polytechnique Fédérale de Lausanne, LIA
-  Université de Leeds
-  Temis S.A.
-  Systran S.A.
-  Softissimo S.A.
-  Université de Grenoble, IMAG
-  Université de Montréal, Dept. Linguistique et Traduction
-  Université de Montréal, RALI
-  Université de Genève, ISSCO
-  University of Aachen, RWTH
-  Universitat Politècnica de Catalunya (UPC)
-  SDL International
-  Comprendium S.L.

Contact : Khalid Choukri -

For more information (in French), please visit

CESART - Evaluation de Systèmes d’Acquisition de Ressources Terminologiques 

Evaluation of terminology extraction tools, including tools for extracting ontologies and semantic relations. Evaluation is to take place with reference to a predetermined list of terms/relations.

CESART Evaluation Package


CESART project deals with the user-oriented evaluation of terminological resources acquisition tools. This kind of user-oriented evaluation relies on the support of experts in information management who are capable of assessing terminological data and confirming usage. The aim is to propose and validate an evaluation protocol allowing one to objectively evaluate and compare different systems for terminology application such as terminological resource creation and semantic relation extraction. The project also aims to create quality-controlled resources such as domain-specific corpora, automatic scoring tool, etc.

CESART consists of two tracks devoted to the evaluation of term extraction and term structuring. Five French language terminology acquisition tools have been participated in the CESART evaluation exercise. As these tools are based on different models and designed for different applications, two evaluation tasks have been defined : term extraction and semantic relation extraction (synonymy) in order to cope with the context of the use of these tools.

Contact : Khalid Choukri -

For more information, please visit (in French).


Evaluation des Analyseurs Syntaxiques du français

An evaluation camapign designed to test syntactic parsers. A side effect of the campaign is the creation of a syntactically parsed reference text composed of several genres of text (newpapers, literary texts, electronic texts etc). 

EASY Evaluation Package


The EASY project is dedicated to the evaluation of syntactic analysers for the French language. The project is financed by the French Ministry of Research in the context of the Technolangue programme.

The aim of the EASy campaign is to design and test an evaluation methodology to compare syntactic analysers on French and to produce a large validated linguistic resource obtained combining automaticaly the annotated corpora produced. The corpora consists of texts taken from various domains (litterature, medicine, technique, general, ...) and of different types : newspapers, questions, websites, oral transcriptions, ...

The project will last 24 months. The evaluation campaign is currently running and will last until 15th December 2004.


-  Khalid Choukri
-  Olivier Hamon
-  Patrick Paroubek (LIMSI)
-  Anne Vilnat (LIMSI)
-  Isabelle Robba (LIMSI)



Corpora providers
-  LLF


-  FT R&D
-  LIC2M
-  LPL

Contact : Khalid Choukri -

For more information (in French), please visit

Evaluation en Question-Réponse

Evaluation of Question/Answering systems. Three reference corpora are envisaged : a large general corpus (newspapers, general texts), a web corpus and a corpus made up of medical texts.

EQUER Evaluation Package


The EQueR Evaluation Campaign provides an evaluation framework for Question/Answering systems for the French language. It aims at giving pertinent input to this research activity by providing it with a state of the art, especially in France.

EQueR includes two tasks of automatic answer retrieval : a generic task over an heterogeneous collection of texts - mainly newspaper articles, and a specialised task over a corpus of medical texts.

Contact : Khalid Choukri -


-  ELDA / ELRA, Organiser
-  CISMEF Centre Hospitalier de Rouen
-  Systal / Pertimm S.A.S.
-  France Telecom R&D, DMI/GRI
-  iSmart S.A.R.L.
-  CNRS/Université d’Avignon, Laboratoire d’Informatique d’Avignon (LIA)
-  CEA, Laboratoire d’ingénierie de la connaissance multimédia multilingue (LIC2M)
-  CNRS, Laboratoire d’Informatique pour la Mécanique et les Sciences de l’Ingénieur (LIMSI)
-  Université de Neuchâtel, Laboratoire Interfacultaire d’Informatique
-  Sinequa S.A.S.
-  Assistance Publique / Hôpitaux de Paris, Sciences et Technologies de l’Information Médicale (STIM)
-  Synapse S.A.

Scientific committee

-  Brigitte Grau, LIMSI - Animatrice
-  Patrice Bellot, LIA
-  Michel Benoit, iSmart
-  Malek Boualem, FranceTelecom RetD
-  Mohand Boughanem, IRIT
-  Patrick Constant, Systal
-  Olivier Ferret, CEA
-  Martine Hurault-Plantet, LIMSI
-  Dominique Laurent, Synapse
-  Claude de Loupy, Sinequa
-  Jacques Savoy, Université de Neuchâtel
-  Pierre Zweigenbaum, STIM

Contact : Khalid Choukri -

For more information (in French), please visit

Evaluation des Systèmes de Transcription Enrichie d’émissions Radiophoniques

Evaluation of automatic broadcast news transcriptions systems. This campaign includes the evaluation of segmentation tasks and identification of named entities.

ESTER Evaluation Package & ESTER Corpus


The purpose of the ESTER Camapign is to evaluate the performance of broadcast news transcription systems.

Contact : Khalid Choukri -

For more information, please visit (in French).

Evaluation des Synthétiseurs de parole en français

Evaluation of Speech synthesis systems. This campaign is to feature a novel method for the evaluation of prosody in sythesised speech.

EVASY Evaluation Package


The EVASY project is dedicated to the evaluation of speech synthesis systems for the French language. The project is financed by the French Ministry of Research in the context of the Technolangue programme.
This evaluation campaign is intended to expand upon the ARC-AUPELF (now AUF) campaign of 1996-1999, the only previous evaluation campaign for text-to-speech systems for the French language. The EvaSy campaign is subdivided into three components :

  • evaluation of the grapheme-to-phoneme module,
  • evaluation of prosody and expressivity,
  • global evaluation of the quality of the synthesised speech.

If you would like to obtain more information about the project and the related work-in-progress report, you are kindly invited to contact :


Khalid Choukri -

Christophe d’Alessandro (LIMSI)



Consortium Partners

-  Bell Labs - Lucent Technologies
-  Elan Speech
-  ICP
-  LIA

For more information (in French), please visit