Introduction

The aim of the RefLex project is to test a set of fundamental hypotheses about the structure and evolution of African languages which are often referred to in the literature, but whose validity has never been demonstrated in practice. These include, among others, the alleged existence of phonological, morphosyntactic and lexical phenomena that are distinctive to Africa, the hypothesis that the morphosyntax of ​​Niger-Congo languages is strongly influenced by prosodic constraints on noun and verb stems, and various hypotheses about the genetic classification of African languages. All these hypotheses have something in common: they can be tested quantitatively; but this in turn assumes the existence of a fairly complete documentation. However at present, only a minority of African languages ​​have been subjected to a thorough descriptive study. RefLex stems from the observation that there are lexical data for about two-thirds of the African languages ​​but that this wealth of data, because it is scattered and often difficult to access, is largely under-exploited.

The aim was to create a comprehensive corpus of lexical data on the languages ​​of Africa and a toolkit to exploit them. The creation of the corpus is a truly collaborative effort in which researchers bring lexical data from the languages they are specialized in. In return, they have access to standardized and reliable lexical data, which they can manipulate and exploit for specific scientific purposes. All African language specialists are welcome to provide tools and lexical data for RefLex and, of course, to exploit this resource for their own research. The lexical corpus is designed as a true reference lexicon (hence the name RefLex).

Thanks to its innovative approach, RefLex solves many of the methodological problems facing other comparable projects. For one thing, all the original sources that make up the RefLex database are accessible to registered users in a digital form (e.g. PDF), so that the reliability of each data entry can be verified, errors be reported and, above all, experimental measures be reproduced from reliable data. In addition, data standardization is achieved through the adoption of strict transcription rules (see the data entry manual ) which smooth out variations in notation due to the diversity of the source materials, and facilitate direct comparison between very disparate documents. Finally, handling and exploitation of the data are optimized through the development and availability of a variety of tools. The pooling of technical specifications allows each participant to develop their own tools for the benefit of the entire community. Thus, apart from the corpus itself, the RefLex website offers a true library of general and specific tools. The RefLex corpus is remarkable for its unprecedented size. In principle, there is no limit to the number of documents that can be integrated into it.