Title page for ETD etd-07012008-133107

Document Type Doctoral Thesis
Author Otlogetswe, Thapelo Joseph
Email otlogets@mopipi.ub.bw
URN etd-07012008-133107
Document Title Corpus design for Setswana lexicography
Degree PhD (African Languages)
Department African Languages
Advisor Name Title
Dr A Kilgarriff Co-Supervisor
Prof D J Prinsloo Supervisor
  • lexicography
  • Setswana
  • Corpus design
Date 2008-04-17
Availability unrestricted

This PhD thesis is about the design of a Setswana corpus for lexicography. While various corpora have been compiled and a variety of corpora-based researches attempted in African languages, no effort has been made towards corpus design. Additionally, although extensive analysis of the Setswana language has been done by missionaries, grammarians and linguists since the 1800s, none of such research is in corpus design. Most research has been largely on the grammatical study of the language.

The recent corpora research in African languages in general has been on the use of corpora for the compilation of dictionaries and little of it is in corpus design. Pioneers of this kind of corpora research in African languages are Prinsloo and De Schryver (1999), De Schryver and Prisloo (2000 and 2001) and Gouws and Prisloo (2005).

Because of a lack of research in corpora design particularly in African languages, this thesis is an attempt at filling that gap, especially for Setswana. It is hoped that the finding of this study will inspire similar designs in other languages comparable to Setswana.

We explore corpus design by focusing on measuring a variety of text types for lexical richness at comparable token points.

The study explores the question of whether a corpus compiled for lexicography must comprise a variety of texts drawn from different text types or whether the quality of retrieved information for lexicographic purposes from a corpus comprising diverse text varieties could be equally extracted from a corpus with a single text type. This study therefore determines whether linguistic variability is crucial in corpus design for lexicography.

University of Pretoria 2007

  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  00front.pdf 166.11 Kb 00:00:46 00:00:23 00:00:20 00:00:10 < 00:00:01
  01chapters1-2.pdf 231.44 Kb 00:01:04 00:00:33 00:00:28 00:00:14 00:00:01
  02chapter3.pdf 367.33 Kb 00:01:42 00:00:52 00:00:45 00:00:22 00:00:01
  03chapter4.pdf 353.71 Kb 00:01:38 00:00:50 00:00:44 00:00:22 00:00:01
  04chapter5.pdf 513.59 Kb 00:02:22 00:01:13 00:01:04 00:00:32 00:00:02
  05chapter6.pdf 414.11 Kb 00:01:55 00:00:59 00:00:51 00:00:25 00:00:02
  06chapter7.pdf 510.72 Kb 00:02:21 00:01:12 00:01:03 00:00:31 00:00:02
  07chapter8.pdf 151.39 Kb 00:00:42 00:00:21 00:00:18 00:00:09 < 00:00:01
  08back.pdf 259.17 Kb 00:01:11 00:00:37 00:00:32 00:00:16 00:00:01

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact UPeTD.