
Document Type Doctoral Thesis Author Faaß, Gertrud URN etd-10092010-134539 Document Title A morphosyntacic description of Northern Sotho as a basis for an automated translation from Northern Sotho into English Degree PhD Department African Languages Supervisor
Advisor Name Title Prof U Heid Co-Supervisor Prof D J Prinsloo Supervisor Keywords
- morpho-syntactic
- electronic grammars
- word classes
- language units
- English
- Northern Sotho
Date 2010-09-03 Availability unrestricted Abstract This PhD thesis provides a morpho-syntactic description of Northern Sotho from a computational perspective. While a number of publications describe morphological and syntactical aspects of this language, may it be in the form of prescriptive study books (inter alia Lombard (1985); Van Wyk et al. (1992); Poulos and Louwrens (1994)) or of descriptive articles in linguistic journals or conference proceedings (inter alia Anderson and Kotz´e (2006); Kosch (2006); De Schryver and Taljard (2006)), so far no comprehensive description is available that would provide a basis for developing a rule-based parser to analyse Northern Sotho on sentence level. This study attempts to fill the gap by describing a substantial grammar fragment. Therefore, Northern Sotho morpho-syntactic phenomena are explored which results in the following descriptions:
- language units of Northern Sotho are identified, i.e. the tokens and words that form the language. These are sorted into word class categories (parts of speech), using the descriptions of Taljard et al. (2008) as a basis;
- the formal relationships between these units, wherever possible on the level of parts of speech, are described in the form of productive morpho-syntactic phrase grammar rules. These rules are defined within the framework of generative grammar.
Additionally, an attempt is made to find generalisations on the contextual distribution of the many items contained in verbs which are polysemous in terms of their parts of speech. The grammar rules described in the preceding chapter are now explored in order to find patterns in the co-occurrence of parts of speech leading towards a future, more general linguistic modelling of Northern Sotho verbs. It is also shown how a parser could work his way step-by-step doing an analysis of a complete sentence making use of a lexicon and the rules developed here.
We have also implemented some relevant phrase grammar rules as a constraint-based grammar fragment, in line with the theory of Lexical-Functional Grammar (Kaplan and Bresnan, 1982). Here, we utilized the Xerox Linguistic Environment (XLE) with the friendly permission of the Xerox Palo Alto Research Centre (PARC).
Lastly, the study contains some basic definitions for a proposed machine translation (MT) into English attempting to support the development of MT-rules. An introduction to MT and a first contrastive description of phenomena of both languages is provided.
© 2010 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
Please cite as follows:
Faaß, G 2010, A morphosyntacic description of Northern Sotho as a basis for an automated translation from Northern Sotho into English, PhD thesis, University of Pretoria, Pretoria, viewed yymmdd < http://upetd.up.ac.za/thesis/available/etd-10092010-134539 / >
D10/623/ag
Files
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access 00front.pdf 139.93 Kb 00:00:38 00:00:19 00:00:17 00:00:08 < 00:00:01 01chapters1-2.pdf 359.48 Kb 00:01:39 00:00:51 00:00:44 00:00:22 00:00:01 02chapter3.pdf 479.65 Kb 00:02:13 00:01:08 00:00:59 00:00:29 00:00:02 03chapter4.pdf 216.61 Kb 00:01:00 00:00:30 00:00:27 00:00:13 00:00:01 04chapter5.pdf 307.95 Kb 00:01:25 00:00:43 00:00:38 00:00:19 00:00:01 05chapters6-7.pdf 287.85 Kb 00:01:19 00:00:41 00:00:35 00:00:17 00:00:01 06bibliography.pdf 88.45 Kb 00:00:24 00:00:12 00:00:11 00:00:05 < 00:00:01