Document Type Master's Dissertation Author Schriek, Cornelis Arnold firstname.lastname@example.org URN etd-10212011-171754 Document Title Analysis and standardization of marker genotype data for DNA fingerprinting applications Degree MSc Department Biochemistry Supervisor
Advisor Name Title Prof A A Myburg Co-Supervisor Prof F Joubert Supervisor Keywords
- genetic polymorphisms
- fingerprinting applications
- genotype data
Date 2011-09-09 Availability unrestricted AbstractGenetic polymorphisms can be seen as the occurrence of more than one form of a DNA- or protein sequence at a single locus in a group of organisms, where these different forms occur more frequently than can be attributed to mutation alone. The combination of genetic polymorphisms present in the genome of a particular individual is referred to as its genotype. A wide range of genotyping techniques have been developed to detect and visualize genetic polymorphisms. One such technique examines highly polymorphic repetitive DNA regions called microsatellites, also called “short tandem repeats” (STRs) and sometimes “simple sequence repeats” (SSRs) or “simple-sequence length polymorphisms” (SSLPs). A microsatellite region consists of a DNA sequence of identical units of usually 2-6 base pairs strung together to produce highly variable numbers of tandem repeats among individuals of a population. Microsatellite genotyping is a popular choice for many types of studies including individual identification, paternity testing, germplasm evaluation, genome mapping and diversity studies and can be used in many commercial, academic, social, and agricultural applications. There are, however, many obstacles in effectively managing and analysing microsatellite genotype data. Currently, researchers are struggling to effectively manage and analyse rapidly growing volumes of genotyping data. Management problems range from simply the lack of a secure, easily accessible central data repository to more complex issues like the merging and standardization of data from multiple sources into combined datasets. Due to these issues, genetic fingerprinting applications such as identity matching and relatedness studies can be challenging when data from different experiments or laboratories have to be combined into a central database.
The main aim of this M.Sc study in Bioinformatics was to develop a bioinformatics resource for the management and analysis of genetic fingerprinting data from microsatellite marker genotyping studies, and to apply the software to the analysis of microsatellite marker data from ramets of Pinus patula clones with the purpose of analysing clonal identity in pine breeding programmes. The software resource developed here is called GenoSonic. It is a web application that provides users with a secure, easily accessible space where genotyping project data can be managed and analysed as a team. Users can upload and download large amounts of marker genotype data. Once uploaded to the system, DNA fingerprint data needs to be standardised before it can be used in further analyses. To do this, a two-step approach was implemented in GenoSonic. The first step is to assign standardized allele sizes to all of the input allele sizes of the microsatellite fingerprints automatically using a novel automated binning algorithm called CSMerge-1, which was designed specifically to bin data from multiple experiments. The second step is to manually verify the results from the automated binning function and add the verified data to a standardized dataset. Once the genetic fingerprints have been standardized, allele- and genotype frequencies can be viewed for any given marker. GenoSonic also provides functionalities for identity matching. One or more DNA fingerprints from unknown samples can be matched against a standardized dataset to establish identities or infer relatedness. Finally, GenoSonic implements a genetic distance tree construction function, which can be used to visualize relatedness among samples in a selected dataset.
The bioinformatics resource developed in this study was applied to a microsatellite DNA fingerprinting project aimed at the re-establishment or confirmation of clonal identity of Pinus patula ramets from pine clonal seed orchards developed by a South African forestry company at one of their new agricultural estates in South Africa. The results from GenoSonic‟s automated binning function (CSMerge-1) and the results from the identity matching and tree construction exercise were compared to results obtained by human experts who have analysed the data manually. It was demonstrated that the results from GenoSonic equalled or surpassed the manual results in terms of accuracy and consistency, and far surpasses the manual effort in terms of the speed at which analyses could be completed.
GenoSonic was developed with specific focus on reusability, and the ability to be modified or extended to solve future genotyping-related problems. This study not only provides a solution to current genotype data management and analysis needs of researchers, but is aimed at serving as a basic framework, or component library for future software development projects that may be required to address specific needs of researchers dealing with high-throughput genotyping data.
© 2010, University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
Please cite as follows:
Schriek, CA 2010, Analysis and standardization of marker genotype data for DNA fingerprinting applications, MSc dissertation, University of Pretoria, Pretoria, viewed yymmdd < http://upetd.up.ac.za/thesis/available/etd-10212011-171754/ >
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access dissertation.pdf 8.47 Mb 00:39:12 00:20:09 00:17:38 00:08:49 00:00:45