Counting all occurrences of base changes over all reads in sample from vcf file
1
0
Entering edit mode
7.8 years ago
jkillian ▴ 10

I am trying to get a raw count of all occurrences in a sample of each of the 12 possible mutations. I generated a vcf file with a script that uses the following command:

samtools mpileup --BCF --uncompressed --output-tags DP,DP4 -f $WGF -r $CHROM -Q 30 $FILE | bcftools call -vm -O v -o tmp.vcf -

where WGF is my reference: GRCh37.p13.genome.fa and FILE is a the aligned sample in bam format.

I searched through quite a bit of literature to round out my understanding of vcf file format. As it stands, I am planning to use entry 3/4 and 4/4 of the DP4 field to count the occurrences of each alt allele at each record. However, this method will not work for records with multiple alt alleles. In searching the literature, I was unable to identify a good solution for handling these records with multiple alleles for purposes of getting counts of each mutation they represent. Can anyone offer advice on how I might get the read depth of each alt allele in a record with multiple alleles? Further, do I have enough information in my current vcf file to accomplish this, or do I need to change the parameters of my samtools mpileup / bcftools call command to accomplish my task?

Here are two records from my vcf file (one with a single alt allele and one with multiple)

chr10   42383504    .   T   C   34.4609 PASS    DP=46;VDB=3.08786e-09;SGB=-0.69168;RPB=0.0791967;MQB=0.200607;MQSB=0.566714;BQB=1;MQ0F=0.0217391;ICB=1;HOB=0.5;AC=1;AN=2;DP4=16,10,11,8;MQ=16GT:PL:DP:DP4   0/1:68,0,129:45:16,10,11,8
chr10   42383520    rs28776333  A   C,G 93.0    aD  DP=58;VDB=5.03395e-11;SGB=-0.693146;RPB=0.33629;MQB=0.996986;MQSB=0.657561;BQB=0.856374;MQ0F=0.0172414;AC=1,1;AN=2;DP4=3,5,29,14;MQ=17;ASP;OTHERKG;RSPOS=42383520;SAO=0;SSR=0;VC=SNV;VP=050000000005000002000100;WGT=1;dbSNPBuildID=125 GT:PL:DP:DP4    1/2:154,63,92,75,0,101:51:3,5,29,14

All advice is appreciated. Thank you in advance!

EDIT: My script also uses SnpSift to annotate info from dbSNP, cosmic and 1000 genomes.

vcf SNP • 2.7k views
ADD COMMENT
2
Entering edit mode
7.7 years ago
jkillian ▴ 10

I solved this with the following samtools mpileup command:

samtools mpileup --skip-indels -Q 30 -f /references/human/gencode/GRCh37.p13.genome.fa -o '$SAMP'_pu.txt samp.bam

samtools pileup format provides every base from every read in the alt column allowing me to count every occurrence of base change.

ADD COMMENT

Login before adding your answer.

Traffic: 2269 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6