This diversity corresponds to that observed between extant TcII and TcIII strains, and appears to be the maximal divergence observed between evolutionary lineages. Sencondly, when re sequencing the same loci in TcI strains, it was possible to identify new add itional SNPs, corresponding to 18% of the total genetic diversity that can be uncovered in this panel of 18 strains. However, 82% of the observed SNPs had already been identified by analyzing the CL Brener hybrid genome. A similar figure is obtained for TcIV strains. However, by sequencing TcII or TcIII strains, only a minimal number of additional SNPs are uncovered, cor responding to 3 4% of the total. This reflects the fact that the majority of these nucleotide changes have already been identified from CL Brener alleles.
The same low rate of discovery of new SNPs is observed when re sequencing strains from the TcV lineage. Strains from this lineage show very limited genetic diversity when compared with the TcVI strains. Although these observations are based on a small scale re sequencing study, the same trend can be observed when analyzing additional loci from the draft genomic data for TcI and TcII strains. According to this analysis, the next step to significantly in crease the coverage of the genetic diversity identified for T. cruzi, should be the analysis of complete genomic or transcriptomic data from a TcIV strain. Conclusions By taking advantage of the genomic and transcriptomic sequence data from a number of strains representative of different evolutionary lineages of T.
cruzi, we have com piled an initial map of genetic diversity for this important parasite, focused mostly on protein coding, single copy regions of the genome. The picture emerging from this analysis reflects the highly divergent nature of the ances tral haplotypes of the hybrid CL Brener strain. However, the analysis also shows that there is a highly conserved core of the genome under apparent purifying selection, and highlights a number of genes and domains deviating from Cilengitide this extreme. The work represents the first genome wide map of genetic diversity for T. cruzi, covering about half of the estimated nucleotide diversity of the species. Methods Data sources Data used for SNP identification included the T. cruzi CL Brener and Sylvio X10 genomes, RNA seq data from the TcAdriana strain, partial shotgun data from the Esmeraldo strain, as well as other T.
cruzi sequences obtained from GenBank in May, 2007. T. cruzi ESTs were obtained from dbEST and manually curated to ex tract information about their source. To val idate SNPs identified in a limited number of genes, we have checked preliminary assemblies from the JRcl4, and Esmeraldo cl3 strains avail able at the TriTrypDB resource. Sequence clustering and alignment Before clustering, sequences were screened against a library of T.