How to get the ratio of allele counts from GATK derived VCF file?
1
0
Entering edit mode
4 weeks ago

Hello,

I need some suggestions on how to get the allelic counts of the reference and alternate alleles from a GATK derived vcf file. Normally, I have VCF files that were generated from Freebayes and later used for Dosage Calling using Fitpoly or UpDog. In both cases, the number of reference and alternate alleles is required to estimate the ratio. This information is available from the information field of the Freebayes-derived VCF file (see below).

##INFO=<ID=RO,Number=1,Type=Integer,Description="Count of full observations of the reference haplotype.">
##INFO=<ID=AO,Number=A,Type=Integer,Description="Count of full observations of this alternate haplotype.">

However, I am doing SNP calling using GATK, but there is no such information field for reference and alternate allele counts. I also checked that the depth estimation from the DP field provided by Freebayes and GATK is completely different.

Could you please give some suggestions on how should I get the information on allele counts from a VCF file generated by GATK?

Also, could you please give any insights on why Freebayes and GATK gave highly distinct depth estimations?

Freebayes GATK SNP VCF • 425 views
ADD COMMENT
1
Entering edit mode
4 weeks ago

However, I am doing SNP calling using GATK, but there is no such information field for reference and alternate allele counts.

it's in FORMAT/AD for each genotype.

you can also use gatk https://gatk.broadinstitute.org/hc/en-us/articles/21905147896347-AlleleFraction with gatk VariantAnnotator -A AlleleFraction

ADD COMMENT
0
Entering edit mode

Thank you. However, I extracted the AD for each SNP that gives two numbers separated by a comma. I guess the first one is for reference and the second one is for alternate alleles. Then I checked the output from Freebayes for the ref. and alternate allele counts and it is a huge difference between the two estimates. Is that expected?

ADD REPLY
0
Entering edit mode

may be downsampling(?), QC failing reads ?

ADD REPLY
0
Entering edit mode

But I used the same bam files in both freebayes and GATK. Even the overlapping SNPs from both software are giving different estimates in terms of ref. and alternate counts from AD and RO+AO fields.

ADD REPLY
1
Entering edit mode

I think GATK only counts informative reads in AD, do you have a DP field? DP should be closer what FreeBayes reports. You can also check GATK documentation on AD and DP here https://gatk.broadinstitute.org/hc/en-us/articles/360035532252-Allele-Depth-AD-is-lower-than-expected

ADD REPLY
0
Entering edit mode

Ok, thank you for the reply. I will check out the DP field then.

ADD REPLY

Login before adding your answer.

Traffic: 1387 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6