Question: filter vcf file heterozygous calls by allelic depth
1
gravatar for molly.dunn
2.6 years ago by
molly.dunn10
molly.dunn10 wrote:

I'm wondering if anyone knows how to filter heterozygous genotype calls from a vcf file where, for either allele, there is only one read supporting the call. So the allelic depth would be n,1 or 1,n. n meaning the depth could be anything. I'm thinking that we are getting a lot of wrong heterozygous calls due to sequencing error. I tried vcffilter using the command:

vcffilter -g "! ( GT = 0/1 & AD = 1,* OR AD = *,1 )"

No luck.

Any one else have any thoughts?

Thanks!

snp tool alignment next-gen vcf • 2.7k views
ADD COMMENTlink modified 2.6 years ago by igor7.6k • written 2.6 years ago by molly.dunn10

But then, why do you have such a vcf format? Shouldn't you have the depth for each individual allele?

ADD REPLYlink written 2.6 years ago by Sam2.3k

yes I do. The genotype fields are in the following format.

GT:PL:DP:SP:AD:GQ

So a typical genotype data field would look like this:

0/1:20,0,93:4:0:3,1:23

In this case, samtools is calling the position heterozygous for sample because there are 3 reads for the reference allele and one for the variant. I'd like to convert these calls to no call (./.) if there is just one reads supporting either allele.

I hope this makes sense. Thanks again for any help.

ADD REPLYlink written 2.6 years ago by molly.dunn10
1
gravatar for igor
2.6 years ago by
igor7.6k
United States
igor7.6k wrote:

I have not used vcffilter, but bcftools filter allows array subscripts (such as (DP4[0]+DP4[1])/(DP4[2]+DP4[3]) > 0.3), which I think is what you are looking for.

bcftools filter docs: https://samtools.github.io/bcftools/bcftools.html#filter

You have to go to the EXPRESSIONS section for various examples: https://samtools.github.io/bcftools/bcftools.html#expressions

ADD COMMENTlink written 2.6 years ago by igor7.6k

Thanks so much for your help. I've tried it a bunch of ways with bcftools and can't get it to work. Looks like it doesn't want to filter at the genotype level but at the variant level.

I tried to make it simple and just filter out all hets first.

I keep getting errors like this:

Error: the tag "INFO/[%GT" is not defined in the VCF header

The command I used was:

bcftools filter -e [%GT=0/1] in.vcf > out.vcf

Thanks again.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by molly.dunn10

Check the example command in the docs:

bcftools view -i '%ID!="." & MAF[0]<0.01'

If you follow that format, your command should be:

bcftools filter -e 'GT=0/1' in.vcf > out.vcf
ADD REPLYlink written 2.6 years ago by igor7.6k

Thanks Igor for your help. I tried: bcftools filter -e 'GT=0/1' and it didn't do anything. The I tried: bcftools filter -e '"GT=0/1"' and it got rid of all variants where there were any het calls. Then I tried: bcftools filter -e 'FMT/GT="0/1"' and it also got rid of all variants where there was any het calls. So getting the logic to work with just the heterozygous calls isn't really working much less pairing it with the 1,n or n,1 allelic depth.

Thanks for your help. Might just need to write a perl or python script

ADD REPLYlink written 2.6 years ago by molly.dunn10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 850 users visited in the last hour