Theoretical backgroundA recording session with the members of the Banyara community in Nakasongola

A dictionary is a linguistic genre that is most familiar to the general public and one of the
most appreciable products of language documentation, especially for an endangered language
community (see Haviland 2006:129). A dictionary should serve the needs and interests of
both the respective speech community including educationists and the academic community
of linguists, anthropologists, and other researchers (cf. Mosel 2004). A dictionary of an
endangered language is important to indigenous communities mainly in two ways. Because
of the diverse target user groups, the dictionary produced during the project will be made
available in several formats: Whereas an electronic database (e.g. in the form of an online
publication) seems to be the best medium for academic purposes, other speech community
members without access to modern technology will benefit more from a printed version of
the dictionary. Also, the structure of the dictionary is dictated by its dual purpose nature: In
contrast to regular bilingual dictionaries of major languages which primarily serve as a tool
for translation or foreign language acquisition, a dictionary of an endangered language, such
as Luruuli/Lunyara is also a resource for research and a repository of the language for the
speech community (cf. Mosel 2004). Thus, in addition to lexical information, it will contain
detailed grammatical information, as well as elaborated discussions of individual semantic
fields, such as kinship terms, animal and plant names and terms relating to the material
culture and the social structure. Also some culture-specific encyclopaedic information will be
incorporated, such as information related to particular places and figures prominent in
Baruuli/Banyara culture.


The interlanguage constituent largely relates to the internal structure of the language. Such
structures include the phonology, morphology, and syntax of the language. The internal
structure of Luruuli/Lunyara will be derived from the grammar sketch that will be written as
part of this project and will constitute part of the appendix of the dictionary. The grammar
sketch will mainly highlight the grammatical information that is necessary for writing the
dictionary. These will include brief descriptions of phonological, morphological, and
syntactic structures. In particular, it will highlight a number of grammatical topics such as the
alphabet, phoneme inventory, tone, noun system, pluralisation, verb agreement, negation,
tense, aspect, mood, adjectives, pronouns, demonstratives, basic sentence types, and other
grammatical phenomena that the research team may find necessary for a general user
dictionary of an endangered language.


Methodological approach

In the 21st century, all good dictionaries take corpus data as their starting point (cf. McEnery
et al. 2006, Atkins and Rundell 2008). English corpora designed for use in lexicography have
been around since the beginning of the 1980s. A short while later, corpora of African
languages compiled for lexicographic purposes emerged (see e.g. Chabata 2000 on a corpus
of Shona). Anyone embarking on the creation of a lexicographic corpus can therefore draw
on a set of guiding principles and a body of good practice which have evolved during the last
30 years. A corpus is understood as a collection of language text in electronic form, selected
according to a number of criteria to represent, as far as possible, a language or language
variety as a source of data for linguistic research, e.g. for lexicographic work (cf. for instance
Sinclair 2005). The main advantages of using corpora in lexicographic work stem from the
fact that they provide objective evidence of language in use – a fundamental prerequisite for a
reliable dictionary. In practice, two major types of corpus data play a central role in
lexicographic work: frequency data on individual lexemes, word forms and usages, as well as
authentic examples. Further advantages come from the machine-readable nature of corpora:
With the help of dedicated tools, such as concordancers, one can search for relevant contexts
in a large body of text in a just few seconds (McEnery et al. 2006).


At the stage of data-collection, the project members will be dependent on the native speakers
to a large extent, both in terms of topics of the provided texts, as well as in terms of
differences among the speakers (age, gender, educational background, etc.).

An envisaged corpus of 100,000 words is a reasonable compromise, as it can be
compiled within a few months of fieldwork and will provide sufficient frequency information
and authentic examples for the proposed dictionary of 10,000 words.

The corpus will be elicited from people of different ages, dialects, professions and religious beliefs.
This will allow a collection of a balanced corpus representing the entire community of
Luruuli/Lunyara speakers. Several techniques will be used to assure the contextual
diversity of the collected texts and to obtain dedicated vocabulary which is only used
for specialised communication, such as barkcloth making, child initiation, marriage, coronation
of cultural leaders, and fishing (among others). Staged communication events will be arranged
in which the participants, mainly elders, will demonstrate and explain how such functions are
conducted (cf. Bowern 2008). These events will be recorded and transcribed. At the stage of meaning
definitions, the project will also employ the method of focus group discussions to elicit further lexical
meanings not encountered in the corpus and to get a deeper understanding of the grammatical
elements. The discussion will be attended by speakers of different dialects. This will help the
research team to get proper and accurate data to feed into the grammar and the dictionary.
Finally, translations of English words into Luruuli/Lunyara will also be used to supplement
the wordlist. This will be done by native speakers who are also fluent in English and/or
Luganda. Since the majority of Luruuli/Lunyara speakers also speak Luganda as their second
language, Luganda will also be used as an intermediary language to get accurate translations
in the target language.

The processing of the dictionary entries will be done using Toolbox. Toolbox is the most
widely used software for making dictionaries of previously under-researched languages (see
Coward and Grimes 2000).

In line with the common practice in dictionary making, a style manual (also known as a style
guide) will be produced to ensure consistency of definers' and editors' work. Essentially, a
style manual is a book of instructions for lexicographers (cf. Atkins and Rundell 2008,
Landau 2001).

Another method, which will be applied to make the work of the definers’ more consistent and
efficient, is the production of template entries (cf. Atkins and Rundell 2008: 123–128). A
template entry will facilitate writing entries for words that belong to lexical sets, i.e. any
group of words that share a common element of meaning, such as the days of the week or
months of the year, or dishes, plants, and metals. Such lexical sets will also be instrumental in
distributing work packages among the definers, as ideally the person who writes the entry for
dog is also responsible for cat, cow, and mouse.