Question: Counting all occurrences of base changes over all reads in sample from vcf file
0
gravatar for jkillian
4.4 years ago by
jkillian10
jkillian10 wrote:

I am trying to get a raw count of all occurrences in a sample of each of the 12 possible mutations. I generated a vcf file with a script that uses the following command:

samtools mpileup --BCF --uncompressed --output-tags DP,DP4 -f $WGF -r $CHROM -Q 30 $FILE | bcftools call -vm -O v -o tmp.vcf -

where WGF is my reference: GRCh37.p13.genome.fa and FILE is a the aligned sample in bam format.

I searched through quite a bit of literature to round out my understanding of vcf file format. As it stands, I am planning to use entry 3/4 and 4/4 of the DP4 field to count the occurrences of each alt allele at each record. However, this method will not work for records with multiple alt alleles. In searching the literature, I was unable to identify a good solution for handling these records with multiple alleles for purposes of getting counts of each mutation they represent. Can anyone offer advice on how I might get the read depth of each alt allele in a record with multiple alleles? Further, do I have enough information in my current vcf file to accomplish this, or do I need to change the parameters of my samtools mpileup / bcftools call command to accomplish my task?

Here are two records from my vcf file (one with a single alt allele and one with multiple)

chr10   42383504    .   T   C   34.4609 PASS    DP=46;VDB=3.08786e-09;SGB=-0.69168;RPB=0.0791967;MQB=0.200607;MQSB=0.566714;BQB=1;MQ0F=0.0217391;ICB=1;HOB=0.5;AC=1;AN=2;DP4=16,10,11,8;MQ=16GT:PL:DP:DP4   0/1:68,0,129:45:16,10,11,8
chr10   42383520    rs28776333  A   C,G 93.0    aD  DP=58;VDB=5.03395e-11;SGB=-0.693146;RPB=0.33629;MQB=0.996986;MQSB=0.657561;BQB=0.856374;MQ0F=0.0172414;AC=1,1;AN=2;DP4=3,5,29,14;MQ=17;ASP;OTHERKG;RSPOS=42383520;SAO=0;SSR=0;VC=SNV;VP=050000000005000002000100;WGT=1;dbSNPBuildID=125 GT:PL:DP:DP4    1/2:154,63,92,75,0,101:51:3,5,29,14

All advice is appreciated. Thank you in advance!

EDIT: My script also uses SnpSift to annotate info from dbSNP, cosmic and 1000 genomes.

snp vcf • 1.7k views
ADD COMMENTlink modified 4.3 years ago • written 4.4 years ago by jkillian10
2
gravatar for jkillian
4.3 years ago by
jkillian10
jkillian10 wrote:

I solved this with the following samtools mpileup command:

samtools mpileup --skip-indels -Q 30 -f /references/human/gencode/GRCh37.p13.genome.fa -o '$SAMP'_pu.txt samp.bam

samtools pileup format provides every base from every read in the alt column allowing me to count every occurrence of base change.

ADD COMMENTlink written 4.3 years ago by jkillian10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1842 users visited in the last hour