I am trying to get a raw count of all occurrences in a sample of each of the 12 possible mutations. I generated a vcf file with a script that uses the following command:
samtools mpileup --BCF --uncompressed --output-tags DP,DP4 -f $WGF -r $CHROM -Q 30 $FILE | bcftools call -vm -O v -o tmp.vcf -
where WGF is my reference: GRCh37.p13.genome.fa and FILE is a the aligned sample in bam format.
I searched through quite a bit of literature to round out my understanding of vcf file format. As it stands, I am planning to use entry 3/4 and 4/4 of the DP4 field to count the occurrences of each alt allele at each record. However, this method will not work for records with multiple alt alleles. In searching the literature, I was unable to identify a good solution for handling these records with multiple alleles for purposes of getting counts of each mutation they represent. Can anyone offer advice on how I might get the read depth of each alt allele in a record with multiple alleles? Further, do I have enough information in my current vcf file to accomplish this, or do I need to change the parameters of my samtools mpileup / bcftools call command to accomplish my task?
Here are two records from my vcf file (one with a single alt allele and one with multiple)
chr10 42383504 . T C 34.4609 PASS DP=46;VDB=3.08786e-09;SGB=-0.69168;RPB=0.0791967;MQB=0.200607;MQSB=0.566714;BQB=1;MQ0F=0.0217391;ICB=1;HOB=0.5;AC=1;AN=2;DP4=16,10,11,8;MQ=16GT:PL:DP:DP4 0/1:68,0,129:45:16,10,11,8 chr10 42383520 rs28776333 A C,G 93.0 aD DP=58;VDB=5.03395e-11;SGB=-0.693146;RPB=0.33629;MQB=0.996986;MQSB=0.657561;BQB=0.856374;MQ0F=0.0172414;AC=1,1;AN=2;DP4=3,5,29,14;MQ=17;ASP;OTHERKG;RSPOS=42383520;SAO=0;SSR=0;VC=SNV;VP=050000000005000002000100;WGT=1;dbSNPBuildID=125 GT:PL:DP:DP4 1/2:154,63,92,75,0,101:51:3,5,29,14
All advice is appreciated. Thank you in advance!
EDIT: My script also uses SnpSift to annotate info from dbSNP, cosmic and 1000 genomes.