Question: Filtering Vcf Variants Based On Sequencing Coverage
1
gravatar for Juliofdiaz
6.8 years ago by
Juliofdiaz130
Toronto, Ontario, Canada
Juliofdiaz130 wrote:

Hello: I have produced a set of variants using the following pipeline

## SORT BAM FILE FROM REF MAPPING ##
samtools sort r.bam" r_sorted"
## CREATE LIST OF POTENTIAL SNP OR INDEL ##
samtools mpileup -uf ref.fa r_sorted.bam > r.bcf
## PARSE POTENTIAL SNP OR INDEL USING BAYESIAN INFERENCE ##
bcftools view -bvcg r.bcf > r2.bcf
## BCF FILE IS CONVERTED VIEWABLE FORM ##
bcftools view r2.bcf > r.vcf

I want to do a preliminary filtering of the resulting variants based on sequencing depth. So, I have written a short script that does so by using the DP4 values from the FILTER column.

gi|110645304|ref|NC_002516.2|    314283    .    G    T    222    .    DP=67;VDB=0.0384;AF1=1;AC1=2;**DP4=0,1,31,27**;MQ=58;FQ=-169;PV4=0.47,1,0.37,1    GT:PL:GQ    1/1:255,142,0:99

In this example I would add the first two values of the DP4 (reference coverage) and make sure that they are low enough. I would also ass the last two values of the DP4 (SNP coverage) and make sure they are not undercovered or overcovered. I guess my question is whether this is an ok approach.

For more information, I am working with hiseq sequencing data, it is a single sample and it is bacterial whole genome.

Thanks

vcf samtools snp • 4.2k views
ADD COMMENTlink modified 6.8 years ago by Ashutosh Pandey11k • written 6.8 years ago by Juliofdiaz130
3
gravatar for Ashutosh Pandey
6.8 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

Yes this is the right approach. BTW, there is a tool called vcf tools (http://vcftools.sourceforge.net/) that can be used to process (filtering, comparisons etc) the vcf files. But you can write your own code too (I use my own code). You can read latest NGS papers and use the same parameters that they used in case you don;t have an idea about different parameters. Normally, I discard SNPs that are spanned by over represented reads (3 times of the average coverage). Other than number of reads mapping quality should be used too to filter the SNPs.

ADD COMMENTlink written 6.8 years ago by Ashutosh Pandey11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1266 users visited in the last hour