Calculate Allele balance by sample
2
4
Entering edit mode
8.5 years ago
kirannbishwa01 ★ 1.6k

I have a VCF file which variants called and has several field like AD, DP, AF, AC, etc. I however also need to calculate the AB (allele balance for each sample) which the information available in this vcf file. I tried looking for a softwae online but I couldn't find any.

Here is a sample output of the vcf file:

# CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S1

X 5867701 . T C,G 231.77 . AC=1,1;AF=0.500,0.500;AN=2;DP=8;ExcessHet=3.0103;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=45.48;QD=28.97;SOR=1.179 GT:AD:DP:GQ:PL 1/2:0,4,4:8:96:260,167,153,96,0,151

X 7753630 . A T,C 469.77 . AC=1,1;AF=0.500,0.500;AN=2;DP=13;ExcessHet=3.0103;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;QD=30.18;SOR=2.833 GT:AD:DP:GQ:PL 1/2:0,10,3:13:88:498,88,96,416,0,407

Allele balance SNP RNA-Seq • 6.1k views
0
Entering edit mode
3.6 years ago
ashotmarg2004 ▴ 130

Not sure if this is the most elegant and efficient way but I tried with bcftools like this for the case of my tri-allelic sites, similar to the ones you have.

The first part (view -m3 -M3) simply extracts the tri-allelic sites The second part (filter section) excludes heterozygous calls (GT=\"het\") that have allelic balance (AB) over 4 or less than 0.25 (this was just my choice, you should put your own values of course). In your example site: 5867701 will have AB of 4/4=1 while the second site: 7753630 will have AB of 10/3=3.33, i.e. both sites will be kept.

For the little bit tricky notation of FORMAT/AD[0:1]/FORMAT/AD[0:2]>4 check the expressions section of the bcftools's website, http://samtools.github.io/bcftools/bcftools.html#expressions

0
Entering edit mode
3.6 years ago
ashotmarg2004 ▴ 130

After I wrote my "manual" bcftools version, I've actually found this: https://gatk.broadinstitute.org/hc/en-us/articles/360040507131-FilterVcf-Picard- Seems that the --MIN_AB argument might just do what you are looking for, but I haven't used it so cannot comment on it.