Focus on... Claire Wolfarth and the Scoledit project

on the May 17, 2021

Claire Wolfarth is a young doctor at the LIDILEM laboratory in Grenoble. Since her master's thesis, she has been interested in the benefits of natural language processing (NLP) for learning to write. She is particularly working on the Scoledit project on pupils' writings. This project has developed over time to become international.

In 2014, a national project had enable the collection of dictations and productions of texts from pupils in CP and CE1 classes. The LIDILEM laboratory, interested in this corpus, decided to continue collecting data by following the children up to CM2. Catherine Brissaud, Claude Ponton, Corinne Totereau and Claire Wolfarth, then a doctoral student, founded the Scoledit project to study the learning of primary school students through these writings.

Evolution of pupils writing from 6 to 11 years old, 5000 texts available online

The researchers selected 40 schools in the South of France to be representative in terms of socio-cultural diversity. They administered a dictation and a writing exercise to the same pupils every year. Except for the first year, the instructions remained unchanged, and the texts allowed them to follow the evolution of the children from 6 to 11 years old.
Thanks to the grant from the Grenoble Alpes Data Institute, the researchers were able to travel to the schools to collect the data. The collection ended in June 2018. They obtained 5,000 handwritten texts. They were also able to hire temporary workers to digitize and transcribe the many texts. The researchers insisted that the corpus (facsimiles and transcriptions) had to be available online so that it could be used by the scientific community and for the teaching community for further research.

Example of a text from a 6-year-old pupil and its transcription

Create algorythms to highlight the pupils persistant difficulties and make recommendations for teachers

The researchers have created algorithms to process the data in order to highlight, in particular, the verbal tenses used, spelling, syntax and punctuation. Automatic processing was not easy because of the nature of the texts (the texts usually studied are adult writings without fault). The work between linguists, didacticians and psycholinguists made it possible to highlight the difficulties that may persist for the pupils and to make recommendations for teachers.

More information


Published on May 12, 2021

Practical informations



The research team obtained an ANR allowing them to work in partnership with laboratories in Paris and Toulouse for a collection extending to the University
E-Calm project

They also collaborate with Italian and Spanish universities to carry out the same collections in these countries
Scolinter Project.