Question: Snp Depth From Vcf
0
gravatar for Juliofdiaz
7.2 years ago by
Juliofdiaz130
Toronto, Ontario, Canada
Juliofdiaz130 wrote:

I have a set of illumina paired end reads (rone.fq and rtwo.fq), and I have mapped those reads to a reference (ref.fa).

bwa aln ref.fa rone.fq > rtwo.sai
bwa aln ref.fa rtwo.fq > rtwo.sai
bwa sampe ref.fa rone.sai rtwo.sai rone.fq rtwo.fq > aln.sam
#convert sam to bam
samtools view -bS aln.sam > aln.bam
#sort bam file
samtools view aln.bam aln_sorted

Now I am trying to get SNP and INDEL information using the following script

samtools mpileup -ugf ref.fa aln_sorted.bam >aln.bcf
#convert to vcf for viewing
bcftools view aln.bcf > aln.vcf

I get the following VCF (this is just one line of the VCF I actually get)

##fileformat=VCFv4.1
##samtoolsVersion=0.1.17 (r973:277)
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    aln_sorted.bam
gi|110227054|gb|AE004091.2|    32    .    A    C,X    0    .    DP=14;I16=9,1,1,0,352,12854,23,529,600,36000,29,841,159,3395,21,441    PL    0,10,189,30,192,202

I am assuming that samtools thinks this is a SNP, because the REF=A and ALT=C,X. My question is: How can I know the coverage depth of this SNP given this VCF? If this VCF does not have information enough to show the SNP coverage depth then how can I creat a VCF that shows that information? Thanks

vcf samtools bcftools snp illumina • 9.4k views
ADD COMMENTlink written 7.2 years ago by Juliofdiaz130
5
gravatar for Zev.Kronenberg
7.2 years ago by
United States
Zev.Kronenberg11k wrote:

right from the VCF spec:

DP combined depth across samples, e.g. DP=154.

Samtools depth reports raw coverage. In the VCF file DP reports the coverage of all reads at that loci (that pass the quality filter). So 'Samtools depth' gives a different statistics than the DP field.

See:

my similar question

There is also information about the depth on each strand supporting the ref / alt.

IT is worth your time to carefully read the documentation for all the Samtools steps

VCF specs

ADD COMMENTlink modified 7.2 years ago • written 7.2 years ago by Zev.Kronenberg11k
1

Nope DP=14 means you have 14 reads that cover gi|110227054. The way you used samtools it will only report information on those loci that have a suggested mutation. use 'Samtools depth' if you want to know the depth for each loci.

ADD REPLYlink written 7.2 years ago by Zev.Kronenberg11k

I dont quite understand the DP value. For example if DP=14, does that mean that 14 reads support that polymorphism? If so 14 out of how many? I am confused because in position where samtools does not find SNPs the DP value would just mean the read coverage of that position. Thanks

ADD REPLYlink written 7.2 years ago by Juliofdiaz130

Nope DP=14 means you have 14 reads that cover gi|110227054. The way you used samtools it will only report information for those lines that have a suggested mutation. use 'Samtools depth' if you want to know the depth for each loci.

ADD REPLYlink written 7.2 years ago by Zev.Kronenberg11k

SO I ran "samtools depth alnsorted.bam>alndepth" and the info I get about position 32 is "gi|110227054|gb|AE004091.2| 32 33". So from what you are saying I get that the coverage of that base is 32X and the reads that support the SNP is 14? Thanks so much again

ADD REPLYlink written 7.2 years ago by Juliofdiaz130

see my edits. 32 are the raw coverage. 14 is the coverage containing only good reads. I16=9,1,1,0 describes which reads go where. If I am reading the cigar string right this is an insertion. Don't quote me on that.

ADD REPLYlink written 7.2 years ago by Zev.Kronenberg11k

see my edits above.

ADD REPLYlink written 7.2 years ago by Zev.Kronenberg11k

Ok I get it. Why do you say it is an insertion? I thought that was a substitution (because of the comma)

ADD REPLYlink written 7.2 years ago by Juliofdiaz130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1984 users visited in the last hour