Using Gatk On Inbred Species - What To Do With The Heterozygous Calls?
2
5
Entering edit mode
10.8 years ago
William ★ 5.3k

Does anyone here have experience using GATK for SNP and indel calling on an inbred species?

Do you do anything special with the heterozygous calls which you don't expect in an inbred species?

I am now in the situation where, after raw calling with the GATK haplotype caller, all my false positives are heterozygous calls, but also 10 % of my true positives are heterozygous calls (instead of homozygous), as checked vs snp calls based on a bac contig alignment.

gatk • 4.3k views
ADD COMMENT
2
Entering edit mode

I work close to a group studying inbred plants (cereals). They have been surprised to find that there was more SNP variation then they thought in their genomes.

ADD REPLY
1
Entering edit mode

I haven't been through inbred species, but I know GATK is doing some work on haploid genomes (not for HaplotypeCaller, but for UnifiedGenotyper though, but it works fine when using mtDNA for instante). maybe you can find some light on it: http://www.broadinstitute.org/gatk/guide/tagged?tag=ploidy

ADD REPLY
4
Entering edit mode
10.8 years ago
Bioch'Ti ★ 1.1k

Hi, I made the same observation in an autogamous and inbred crop (tomato), with a 10-15% of residual heterozigosity distributed at the genome scale. Especially in plant, it is very difficult/rare to have SNP that display purely homozygous genotypes. So, regarding the heterozygous calls that you observed in your dataset, there are mainly two explanations 1. residual heterozigosity (from introgression for example) and 2. you may have mapped/assembled paralogs. I would strongly advise you to check genotypes frequencies whether if you regularly observe 50/50 SNP genotypes that may highlight the mapping/assembly of paralogs. Finally, a tool called 'reads2snp' that call SNPs and look at genotypes frequencies (taking into account coverage information) has been developed to 'clean' your SNP dataset by giving you the probability for each site to be (or not) a paralogous SNP. Check this out: http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1003457

Hope this helps, Best, C.

ADD COMMENT
0
Entering edit mode
10.6 years ago

I work with inbred mice strains and it is quiet common to see heterozygous SNP calls. For most of the cases we simply ignore them as the strains we are working with are highly inbred. So, most of the heterozygous SNPs should be result of mapping artifacts.

ADD COMMENT
0
Entering edit mode

Do you simply remove them? And if so if there are some publications where the similar filter is applied?

ADD REPLY

Login before adding your answer.

Traffic: 2036 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6