on the May 23, 2019

In collaboration with the team Literatures and Digital Arts (ELAN, UMR Litt&Arts, UGA), the work package 3 of Grenoble Alpes Data Institute - Démarre SHS! - has developed a contributive transcription platform. TACT allows transcription and modeling of digitized documents. Its official launch will happen on June 4th during the Global Challenges Science Week.

TACT is a collaborative production and annotation platform for SSH data. Arnaud Bey, Anne Garcia-Fernandez, Elisabeth Greslou (ELAN), Patrick Guillou, Sylvain Hatier, Célia Marion (Démarre SHS!) and Thomas Lebarbé (Litt&Arts/Démarre SHS!) participated in its development in the framework of the Grenoble Alpes Data Institute.

This platform follows on from the Stendhal manuscripts online project. Twelve years ago, digital tools were developed to transcribe manuscripts facing the impossibility of automatically identifying Stendhal's handwriting. For 10 years, about 85.000 people have visited the site giving access to manuscripts.

The success of this project led to the idea of a contributive transcription platform. Its aim is to allow researchers to upload their manuscript images online, while providing transcription tools and the possibility of opening the project to contributors. TACT helps produce structured data in a collaborative way.

The second step of Démarre SHS! is to bring together researchers from the Grenoble site to analyze its data together. This phase of exploitation involves developing querying tools that will propose answers including the corpus of other hosted projects. The platform could then generate meetings between researchers.

These stages of production and exploitation are part of a global process that also includes exposure and preservation of the data. The exhibition corresponds to the provision of data for the scientific community and their access to the general public from an open research or citizen science perspective. This component raises issues of digital publishing and digital museography. The technical goal is to produce a tool that generates sites automatically from corpora.

Preservation refers to the problem of the loss of data in science, like the recordings of children from which the linguist Claire Blanche-Benveniste studied the acquisition of language. Preservation also refers to a more complex issue that is the use of XML to describe data. An area of research of Démarre SHS! could be the automation of format evolution for encoded scripts.

Three projects on the Tact platform can illustrate the diversity of its applications. The Concordance project is based on the transcription of a corpus of daily newspaper articles written in Tahitian. The goal is to create a dictionary of this endangered language to feed an optical character recognition (OCR) that will allow the automatic transcription of texts in Tahitian.

The writer Jean-Pierre Toussaint has entrusted his drafts of the novel La Réticence to the TACT team to allow others to use the unpublished elements. The author typed, manually annotated and retyped each of the 93 paragraphs of the novel. There are up to twenty versions of the same paragraph which writing order isn’t indicated. OCRs don’t recognize handwritten annotations. TACT allows followers of Jean-Pierre Toussaint to contribute to the transcription of these drafts. Researchers are also working on an automated solution to find the writing order of the different versions.

Benoîte Groult gave her notes to the university library of Anger. The feminist author wrote her autobiography Mon Evasion from the rewriting of Histoire d'une évasion. This book is from a series of interviews with the journalist Josyane Savigneau. In addition to the initial object and the final object, the researchers are confronted with a process of rewriting composed of montages, folding, erasures, color codes, additions and deletions. Beyond the simple transcription, the platform also allows a data enrichment by appealing to the culture of the feminist community. Tact adds an extra layer of contextualization and precisions. Its goal is to give as much to see and read as to understand.

This expertise developed in the scientific field of digital humanities also finds its application in the economic sector. Thus Pauline Soutrenon (Litt & Arts), PhD student in Automatic Language Processing, works with the company Comongo. Voted start-up of the year 2018 by Presences magazine of CCI Grenoble, Comongo proposes to evaluate the image of a brand using automated processes on very small samples. The goal of the collaboration is to produce an interface to analyze opinions for those who do not master the encoding.


Published on June 18, 2019