Question: Using Gatk On Inbred Species - What To Do With The Heterozygous Calls?
gravatar for William
7.4 years ago by
William4.7k wrote:

Does anyone here have experience using GATK for SNP and indel calling on an inbred species?

Do you do anything special with the heterozygous calls which you don't expect in an inbred species?

I am now in the situation where, after raw calling with the GATK haplotype caller, all my false positives are heterozygous calls, but also 10 % of my true positives are heterozygous calls (instead of homozygous), as checked vs snp calls based on a bac contig alignment.

gatk • 3.2k views
ADD COMMENTlink modified 7.2 years ago by Ashutosh Pandey12k • written 7.4 years ago by William4.7k

I work close to a group studying inbred plants (cereals). They have been surprised to find that there was more SNP variation then they thought in their genomes.

ADD REPLYlink written 7.4 years ago by Eric Normandeau10k

I haven't been through inbred species, but I know GATK is doing some work on haploid genomes (not for HaplotypeCaller, but for UnifiedGenotyper though, but it works fine when using mtDNA for instante). maybe you can find some light on it:

ADD REPLYlink written 7.4 years ago by Jorge Amigo12k
gravatar for Bioch'Ti
7.4 years ago by
France (Avignon)
Bioch'Ti1.0k wrote:

Hi, I made the same observation in an autogamous and inbred crop (tomato), with a 10-15% of residual heterozigosity distributed at the genome scale. Especially in plant, it is very difficult/rare to have SNP that display purely homozygous genotypes. So, regarding the heterozygous calls that you observed in your dataset, there are mainly two explanations 1. residual heterozigosity (from introgression for example) and 2. you may have mapped/assembled paralogs. I would strongly advise you to check genotypes frequencies whether if you regularly observe 50/50 SNP genotypes that may highlight the mapping/assembly of paralogs. Finally, a tool called 'reads2snp' that call SNPs and look at genotypes frequencies (taking into account coverage information) has been developed to 'clean' your SNP dataset by giving you the probability for each site to be (or not) a paralogous SNP. Check this out:

Hope this helps, Best, C.

ADD COMMENTlink written 7.4 years ago by Bioch'Ti1.0k
gravatar for Ashutosh Pandey
7.2 years ago by
Ashutosh Pandey12k wrote:

I work with inbred mice strains and it is quiet common to see heterozygous SNP calls. For most of the cases we simply ignore them as the strains we are working with are highly inbred. So, most of the heterozygous SNPs should be result of mapping artifacts.

ADD COMMENTlink written 7.2 years ago by Ashutosh Pandey12k

Do you simply remove them? And if so if there are some publications where the similar filter is applied?

ADD REPLYlink written 10 months ago by rimgubaev190
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1054 users visited in the last hour