Document Type Master's Dissertation Author De Ridder, Corne firstname.lastname@example.org URN etd-08172010-202532 Document Title Flexible finite automata-based algorithms for detecting microsatellites in DNA Degree MSc Department Computer Science Supervisor
Advisor Name Title Prof D G Kourie Committee Chair Keywords
- approximate tandem repeats
- regular expression
- finite automata
Date 2010-05-09 Availability unrestricted Abstract
Apart from contributing to Computer Science, this research also contributes to Bioinformatics, a subset of the subject discipline Computational Biology. The main focus of this dissertation is the development of a data-analytical and theoretical algorithm to contribute to the analysis of DNA, and in particular, to detect microsatellites.
Microsatellites, considered in the context of this dissertation, refer to consecutive patterns contained by genomic sequences. A perfect tandem repeat is defined as a string of nucleotides which is repeated at least twice in a sequence. An approximate tandem repeat is a string of nucleotides repeated consecutively at least twice, with small differences between the instances.
The research presented in this dissertation was inspired by molecular biologists who were discovered to be visually scanning genetic sequences in search of short approximate tandem repeats or so called microsatellites. The aim of this dissertation is to present three algorithms that search for short approximate tandem repeats. The algorithms comprise the implementation of finite automata.
Thus the hypothesis posed is as follows: Finite automata can detect microsatellites effectively in DNA. "Effectively" includes the ability to fine-tune the detection process so that redundant data is avoided, and relevant data is not missed during search.In order to verify whether the hypothesis holds, three theoretical related algorithms have been proposed based on theorems from finite automaton theory. They are generically referred to as the FirežSat algorithms. These algorithms have been implemented, and the performance of FirežSat2 has been investigated and compared to other software packages. From the results obtained, it is clear that the performance of these algorithms differ in terms of attributes such as speed, memory consumption and extensibility. In respect of speed performance, FirežSat outperformed rival software packages.
It will be seen that the FirežSat algorithms have several parameters that can be used to tune their search. It should be emphasized that these parameters have been devised in consultation with the intended user community, in order to enhance the usability of the software. It was found that the parameters of FirežSat can be set to detect more tandem repeats than rival software packages, but also tuned to limit the number of detected tandem repeats.
Copyright © 2010, University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
Please cite as follows:
De Ridder, C 2010, Flexible finite automata-based algorithms for detecting microsatellites in DNA, MSc dissertation, University of Pretoria, Pretoria, viewed yymmdd < http://upetd.up.ac.za/thesis/available/etd-08172010-202532/ >
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access dissertation.pdf 5.25 Mb 00:24:17 00:12:29 00:10:56 00:05:28 00:00:27