Sanger Imputation Server - genotype probability distribution
5.8 years ago


I have to do an imputation using Sanger Imputation Server.

I have prepared data (which is aligned with reference panel) and submited, but i receveid an e-mail as follows:

Update from Sanger Imputation Service:

--- Aborted Job --- The genotype probability distribution in the input file does not match the reference panel frequencies well. The number of genotypes expected with low frequencies under HWE (with P<=0.1) is too big in the user data: 0.59 whereas the threshold is 0.26. For comparison, the number of these genotypes in 1000Genomes data is 0.17, the attached plot shows typical GT distributions

This is usually an indicator of REF,ALT alleles being on incorrect strand. Another frequent problem is the VCF using a different reference sequence, for example GRCh38 instead of GRCh37.

The attached graph was produced using the bcftools/af-dist plugin, check these links

--- Help --- Please check these links for help

How can I solve that???


2.3 years ago
Dan ▴ 530

Check the file with the VCF debugulator:

It will tell you which positions match the reference or not (good to check reference).

Not sure how you could have strand swapped the calls, but just go in and check a few.

Is your population very different from those in 1000 Genomes?

Could your read accuracy be low or your alignments bad?

Once you find the problem, then we can suggest a solution.


