Snp Depth From Vcf
1
0
Entering edit mode
12.2 years ago
Juliofdiaz ▴ 140

I have a set of illumina paired end reads (rone.fq and rtwo.fq), and I have mapped those reads to a reference (ref.fa).

bwa aln ref.fa rone.fq > rtwo.sai
bwa aln ref.fa rtwo.fq > rtwo.sai
bwa sampe ref.fa rone.sai rtwo.sai rone.fq rtwo.fq > aln.sam
#convert sam to bam
samtools view -bS aln.sam > aln.bam
#sort bam file
samtools view aln.bam aln_sorted

Now I am trying to get SNP and INDEL information using the following script

samtools mpileup -ugf ref.fa aln_sorted.bam >aln.bcf
#convert to vcf for viewing
bcftools view aln.bcf > aln.vcf

I get the following VCF (this is just one line of the VCF I actually get)

##fileformat=VCFv4.1
##samtoolsVersion=0.1.17 (r973:277)
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    aln_sorted.bam
gi|110227054|gb|AE004091.2|    32    .    A    C,X    0    .    DP=14;I16=9,1,1,0,352,12854,23,529,600,36000,29,841,159,3395,21,441    PL    0,10,189,30,192,202

I am assuming that samtools thinks this is a SNP, because the REF=A and ALT=C,X. My question is: How can I know the coverage depth of this SNP given this VCF? If this VCF does not have information enough to show the SNP coverage depth then how can I creat a VCF that shows that information? Thanks

samtools snp bcftools vcf illumina • 14k views
ADD COMMENT
5
Entering edit mode
12.2 years ago

right from the VCF spec:

DP combined depth across samples, e.g. DP=154.

Samtools depth reports raw coverage. In the VCF file DP reports the coverage of all reads at that loci (that pass the quality filter). So 'Samtools depth' gives a different statistics than the DP field.

See:

my similar question

There is also information about the depth on each strand supporting the ref / alt.

IT is worth your time to carefully read the documentation for all the Samtools steps

VCF specs

ADD COMMENT
1
Entering edit mode

Nope DP=14 means you have 14 reads that cover gi|110227054. The way you used samtools it will only report information on those loci that have a suggested mutation. use 'Samtools depth' if you want to know the depth for each loci.

ADD REPLY
0
Entering edit mode

I dont quite understand the DP value. For example if DP=14, does that mean that 14 reads support that polymorphism? If so 14 out of how many? I am confused because in position where samtools does not find SNPs the DP value would just mean the read coverage of that position. Thanks

ADD REPLY
0
Entering edit mode

Nope DP=14 means you have 14 reads that cover gi|110227054. The way you used samtools it will only report information for those lines that have a suggested mutation. use 'Samtools depth' if you want to know the depth for each loci.

ADD REPLY
0
Entering edit mode

SO I ran "samtools depth alnsorted.bam>alndepth" and the info I get about position 32 is "gi|110227054|gb|AE004091.2| 32 33". So from what you are saying I get that the coverage of that base is 32X and the reads that support the SNP is 14? Thanks so much again

ADD REPLY
0
Entering edit mode

see my edits. 32 are the raw coverage. 14 is the coverage containing only good reads. I16=9,1,1,0 describes which reads go where. If I am reading the cigar string right this is an insertion. Don't quote me on that.

ADD REPLY
0
Entering edit mode

see my edits above.

ADD REPLY
0
Entering edit mode

Ok I get it. Why do you say it is an insertion? I thought that was a substitution (because of the comma)

ADD REPLY

Login before adding your answer.

Traffic: 1774 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6