Question: Using Gatk On Inbred Species - What To Do With The Heterozygous Calls?
7.4 years ago by
William wrote:

Does anyone here have experience using GATK for SNP and indel calling on an inbred species?

Do you do anything special with the heterozygous calls which you don't expect in an inbred species?

I am now in the situation where, after raw calling with the GATK haplotype caller, all my false positives are heterozygous calls, but also 10 % of my true positives are heterozygous calls (instead of homozygous), as checked vs snp calls based on a bac contig alignment.

I work close to a group studying inbred plants (cereals). They have been surprised to find that there was more SNP variation then they thought in their genomes.

I haven't been through inbred species, but I know GATK is doing some work on haploid genomes (not for HaplotypeCaller, but for UnifiedGenotyper though, but it works fine when using mtDNA for instante). maybe you can find some light on it:

7.4 years ago by
France (Avignon)
Bioch'Ti wrote:

Hi, I made the same observation in an autogamous and inbred crop (tomato), with a 10-15% of residual heterozigosity distributed at the genome scale. Especially in plant, it is very difficult/rare to have SNP that display purely homozygous genotypes. So, regarding the heterozygous calls that you observed in your dataset, there are mainly two explanations 1. residual heterozigosity (from introgression for example) and 2. you may have mapped/assembled paralogs. I would strongly advise you to check genotypes frequencies whether if you regularly observe 50/50 SNP genotypes that may highlight the mapping/assembly of paralogs. Finally, a tool called 'reads2snp' that call SNPs and look at genotypes frequencies (taking into account coverage information) has been developed to 'clean' your SNP dataset by giving you the probability for each site to be (or not) a paralogous SNP. Check this out:

Hope this helps, Best, C.

7.2 years ago by
Ashutosh Pandey wrote:

I work with inbred mice strains and it is quiet common to see heterozygous SNP calls. For most of the cases we simply ignore them as the strains we are working with are highly inbred. So, most of the heterozygous SNPs should be result of mapping artifacts.

Do you simply remove them? And if so if there are some publications where the similar filter is applied?

