The aim of the RefLex project is to test a set of fundamental hypotheses about the structure and evolution of African languages which are often referred to in the literature, but whose validity has never been demonstrated in practice. These include, among others, the alleged existence of phonological, morphosyntactic and lexical phenomena that are distinctive to Africa, the hypothesis that the morphosyntax of Niger-Congo languages is strongly influenced by prosodic constraints on noun and verb stems, and various hypotheses about the genetic classification of African languages. All these hypotheses have something in common: they can be tested quantitatively; but this in turn assumes the existence of a fairly complete documentation. However at present, only a minority of African languages have been subjected to a thorough descriptive study. RefLex stems from the observation that there are lexical data for about two-thirds of the African languages but that this wealth of data, because it is scattered and often difficult to access, is largely under-exploited.
The aim was to create a comprehensive corpus
of lexical data on the languages of Africa and a toolkit to
exploit them. The creation of the corpus is a truly collaborative effort
in which researchers bring lexical data from the languages they are
specialized in. In return, they have access to standardized and reliable
lexical data, which they can manipulate and exploit for specific
scientific purposes. All African language specialists are welcome to
provide tools and lexical data for RefLex and, of course, to exploit this
resource for their own research. The lexical corpus is designed as a true
reference lexicon (hence the name RefLex).
Thanks to its innovative approach, RefLex solves many of the
methodological problems facing other comparable projects. For one thing, all the original sources that make up the
RefLex database are accessible to registered users in a digital form (e.g.
PDF), so that the reliability of each data entry can be verified, errors
be reported and, above all, experimental measures be reproduced from
reliable data. In addition, data standardization is achieved through the
adoption of strict transcription rules (see the data entry manual )
which smooth out variations in notation due to the diversity of the source
materials, and facilitate direct comparison between very disparate
documents. Finally, handling and exploitation of the data are optimized
through the development and availability of a variety of tools. The
pooling of technical specifications allows each participant to develop
their own tools for the benefit of the entire community. Thus, apart from
the corpus itself, the RefLex website
offers a true library of general and specific tools. The RefLex corpus is
remarkable for its unprecedented size. In principle, there is no limit to
the number of documents that can be integrated into it.