5.3. GATKAs described in the variant calling section, the Genome Analysis ToolKit (GATK), which provides selleck inhibitor a collection of data analysis tools, can also allow indel calling based on the MapReduce programming approach [22]. Details of GATK in comparison to other indel calling methods including Dindel (VarScan, SAMtools mpileup) are evaluated in Neuman et al. [38].6. Filtering and AnnotationAfter alignment and variant calling, a list of thousands of potential differences between the genome under study and the reference genome is generated. The next step is to determine which of these variants are likely to contribute to the pathological process under study. The third step involves a combination of both filtering (removing variants that fit specific genetic models or are not present in normal tissue) as well as annotation (looking up information about variants and identifying ones that fit the biological process).
Filtering can be done with a genetic pedigree or with cancer and normal samples from the same individual. In the instance of cancer, a common method is removing variants that are present in both the cancer sample and the normal sample, leaving only somatic variants, which have mutated from the germline sequence. In the instance of a pedigree, filtering can be done based on the different inheritance patterns. For example, if the inheritance pattern is autosomal recessive, the variants that are heterozygous in the parents and homozygous in the child can be chosen. Similar methods can be done with larger pedigrees based on the inheritance pattern.
In addition to filtering, further selection of causal variants can be based on existing annotation or predicted functional effect. Many tools exist to examine relevant variants by referencing previously known information about their biological functions and inferring potential effects based on their genomic context. In addition, many tools have been developed to identify genetic variants that cause disease pathogenesis or phenotypic variance [39]. Rare nonsynonymous SNPs are SNPs that cause amino acid substitution (AAS) in the coding region, which potentially affect the function of the protein coded and could contribute to disease.The advance of exome and genomic sequencing is yielding an extensive number of human genetic variants, and a number of disease-associated SNVs can be identified following alignment and variant calling.
AV-951 Unlike nonsense and frameshift mutations, which often result in a loss of protein function, pinpointing disease-causal variants among numerous SNVs has become one of the major challenges due to the lack of genetic information. For instance, ~1,300 loci are shown to be associated with ~200 diseases by GWASs but only a few of these loci have been identified as disease-causing variants [40].