Title page for ETD etd-07282008-121520


Document Type Doctoral Thesis
Author Kroeze, Jan Hendrik
Email jan.kroeze@gmail.com
URN etd-07282008-121520
Document Title Developing an XML-based, exploitable linguistic database of the Hebrew text of Gen. 1:1-2:3
Degree PhD (Information Technology)
Department Information Science
Supervisor
Advisor Name Title
Prof T J D Bothma Committee Chair
Dr M C Matthee Committee Co-Chair
Keywords
  • online analytical processing (OLAP)
  • XML
  • Hebrew Bible
  • threedimensional array
  • visualisation
  • computational linguistics
  • text data mining
  • data warehousing
  • database management
  • round-tripping
Date 2008-09-02
Availability unrestricted
Abstract

The thesis discusses a series of related techniques that prepare and transform raw linguistic data for advanced processing in order to unveil hidden grammatical patterns. A threedimensional array is identified as a suitable data structure to build a data cube to capture multidimensional linguistic data in a computer's temporary storage facility. It also enables online analytical processing, like slicing, to be executed on this data cube in order to reveal various subsets and presentations of the data. XML is investigated as a suitable mark-up language to permanently store such an exploitable databank of Biblical Hebrew linguistic data. This concept is illustrated by tagging a phonetic transcription of Genesis 1:1-2:3 on various linguistic levels and manipulating this databank. Transferring the data set between an XML file and a threedimensional array creates a stable environment allowing editing and advanced processing of the data in order to confirm existing knowledge or to mine for new, yet undiscovered, linguistic features. Two experiments are executed to demonstrate possible text-mining procedures. Finally, visualisation is discussed as a technique that enhances interaction between the human researcher and the computerised technologies supporting the process of knowledge creation. Although the data set is very small there are exciting indications that the compilation and analysis of aggregate linguistic data may assist linguists to perform rigorous research, for example regarding the definitions of semantic functions and the mapping of these functions onto the syntactic module.

ŠUniversity of Pretoria 2008

B23/eo

Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  00front.pdf 73.99 Kb 00:00:20 00:00:10 00:00:09 00:00:04 < 00:00:01
  01chapter1.pdf 206.88 Kb 00:00:57 00:00:29 00:00:25 00:00:12 00:00:01
  02chapter2.pdf 753.03 Kb 00:03:29 00:01:47 00:01:34 00:00:47 00:00:04
  03chapter3.pdf 912.00 Kb 00:04:13 00:02:10 00:01:54 00:00:57 00:00:04
  04chapter4.pdf 269.13 Kb 00:01:14 00:00:38 00:00:33 00:00:16 00:00:01
  05chapter5.pdf 247.18 Kb 00:01:08 00:00:35 00:00:30 00:00:15 00:00:01
  06chapter6.pdf 377.09 Kb 00:01:44 00:00:53 00:00:47 00:00:23 00:00:02
  07chapter7.pdf 763.21 Kb 00:03:32 00:01:49 00:01:35 00:00:47 00:00:04
  08chapter8.pdf 57.41 Kb 00:00:15 00:00:08 00:00:07 00:00:03 < 00:00:01
  09bibliography.pdf 127.73 Kb 00:00:35 00:00:18 00:00:15 00:00:07 < 00:00:01
  10addenda.pdf 893.90 Kb 00:04:08 00:02:07 00:01:51 00:00:55 00:00:04
  11Addenda.zip 781.88 Kb 00:03:37 00:01:51 00:01:37 00:00:48 00:00:04

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact UPeTD.