Lignin is a highly abundant aromatic biopolymer deposited during the final stages of secondary cell wall formation in plants and it constitutes a substantial proportion of the dry weight of woody plant stems. Lignin contributes structural support to xylem cell walls and hydrophobisity to water-conducting vessels and forms a defence mechanism against pathogen invasion. Although being an essential part of normal plant cell development, lignin content and composition are targets for tree improvement, because residual lignin in paper pulp has negative effects on paper quality and lignin therefore has to be removed using treatments that are expensive and often detrimental to the environment.

At present, little is known about the amount of allelic diversity in lignin biosynthetic genes and whether such diversity may be associated with variation in lignin content and composition. However, the identification of alleles associated with desirable lignin phenotypes is dependent on a detailed understanding of the molecular evolution and population genetics of these genes. This M.Sc. study was aimed at analysing nucleotide and allelic diversity in two lignin biosynthetic genes of Eucalyptus trees. Additionally, the study aimed to develop single nucleotide polymorphism (SNP) markers that could be used to assay allelic diversity for these genes in populations of two target species, E. grandis and E. smithii.

Orthologues of the tobacco LIM-domain1 (NtLIM1) transcription factor gene involved in the regulation of lignin biosynthesis were isolated from E. grandis and E. smithii. Approximately 3 kb of genomic sequence including the promoter and full-length gene regions were isolated for the two orthologues, respectively labeled EgrLIM1 and EsLIM1. The predicted amino acid sequences of EgrLIM1 and EsLIM1 were 99.4% identical to each other and indicated that LIM1 is a small protein of only 188 residues in eucalypt trees and has a predicted molecular weight of 21.0 kDa. Quantitative, real-time RT-PCR analysis confirmed the expression of LIM1 in wood-forming tissues undergoing lignification. Ten putative cis-regulatory elements were observed in the promoter regions of EgrLIM1 and EsLIM1including a GA-dinucleotide microsatellite that appears to be specific to LIM1 promoters of Eucalyptus tree species. The full-length LIM1 gene sequences could subsequently be used in the assessment of nucleotide and allelic diversity, together with the full-length CAD2 sequences that were already available in the public domain.

The level of nucleotide and allelic diversity and the distribution and decay of linkage disequilibrium (LD) were surveyed in 5 and 3 derived gene fragments of CAD2 and LIM1 obtained from 20 E. grandis and 20 E. smithii individuals. Each gene displayed a unique genetic diversity profile, but for the most part, nucleotide diversity (π) was estimated at approximately 0.0010 except for the E. grandis LIM1 gene where π lower than 0.0040 was observed. Generally, except for the high amounts of LD observed in the CAD2 gene of E. grandis (> 2.5 kb), LD decayed within 500 bp. A large number (13 to 45) of SNP sites (defined as single nucleotide changes with minor allele frequencies of at least 0.10 in each species) were observed in each gene of each species. The high SNP density (ranging from one per 45 to one per 155 bp) observed in the two genes facilitated the efficient development of SNP markers to be used in future aspects of LD mapping, association genetics and marker-assisted breeding.

The allele sequences obtained for the CAD2 and LIM1 genes were used as templates for the development of SNP marker panels (a series of six or seven SNP markers analysed together) for the analysis (tagging) of SNP haplotype diversity in species-wide reference populations (100 E. grandis and 137E. smithii individuals) of the two species. Each tag SNP was assayed using a single base extension assay and capillary gel electrophoresis. High polymorphism information content (average PIC of 0.836) was observed for the SNP marker panels. Four SNPs in the CAD2 and two in the LIM1 genes were found to be polymorphic in E. grandis and E. smithii (i.e. trans-specific SNPs), suggesting a possible ancestral origin for these polymorphisms.

Assessment of candidate gene variation in the genomes of forest trees is of importance to ultimately be able to predict the amount and structure of nucleotide diversity available for the future design of SNP assays at the whole-genome level. Such assays will be useful to study differentiation among tree species and populations, to associate nucleotide polymorphisms with desirable phenotypes and to increase the efficiency of tree improvement approaches.

