Question: How to set up my vcf with a minimum depth?
gravatar for apl00028
8 months ago by
apl0002870 wrote:

I am trying to set up the minimum depth which samtools mpileup considers a polymorphism for RNAseq samples. For that I did this:

I did an align with BWA:

bwa mem scaffold_filt_PV014_bwa PV009_1_cmv3.fq_pairs_R1.fastq PV009_2_cmv3.fq_pairs_R2.fastq > scaffold_filt_OD_PV009.sam

I change sam format to bam format and I sorted it:

samtools view -bS scaffold_filt_OD_PV009.sam | samtools sort - > scaffold_filt_OD_PV009.bam

I index the reference:

samtools faidx scaffold_filt_PV014_index.fa

I defined the snp with bcftool:

bcftools mpileup --redo-BAQ --min-BQ 30  --per-sample-mF --annotate FORMAT/AD,FORMAT/ADF,FORMAT/ADR,FORMAT/DP,FORMAT/SP,INFO/AD,INFO/ADF,INFO/ADR -f scaffold_filt_PV014_index.fa scaffold_filt_OD_PV009.bam | bcftools call --multiallelic-caller --variants-only -Ov > bcf/scaffold_filt_OD_PV009_q10_variant.vcf

I tried to select those polymorphisms with a depth of 20 or more:

cat scaffold_filt_OD_PV009_q10_variant.vcf  | grep -v "DP>=20" > scaffold_filt_OD_PV009_q10_variant_filter.vcf

But this last step has not any effect. So, What is my mistake? How can I solve it? What other alternatives can I do? Thanks in advantage.

snp • 370 views
ADD COMMENTlink modified 8 months ago • written 8 months ago by apl0002870

bcftools has various filtering options, please check the manual towards filtering. I strongly recommend to NEVER do any grep or similar things with a VCF, always use dedicated tools for VCF/BCF manipulation, as you might corrupt your file. See for inspiration How to filter vcf file on minimum genotype depth and quality for each sample

ADD REPLYlink written 8 months ago by ATpoint31k

I am trying it using this function of that link: vcftools --vcf scaffold_filt_OD_PV009_q10_variant.vcf --out output_prefix --min-meanDP 20 --recode --recode-INFO-all

but I get SNP with lower depth of 20 and filter some SNP with a depth higher than 20.

When I tried use this function: vcftools --vcf scaffold_filt_OD_PV009_q10_variant.vcf --out output_prefix --minDP 20 --recode --recode-INFO-all

It did not work.

What am I doing wrong? Thanks in advantage

ADD REPLYlink modified 8 months ago • written 8 months ago by apl0002870

For your information, you can avoid intermediate files and save time by piping the output of bwa mem directly to samtools and create a sorted bam file:

bwa mem genome.fa reads.fastq.gz | samtools sort -o sorted_alignment.bam

This requires that your samtools is quite recent, but you should always try to use up-to-date software.

ADD REPLYlink written 8 months ago by WouterDeCoster43k
gravatar for apl00028
8 months ago by
apl0002870 wrote:

Finaly I got my aim using this function:

 bcftools view --types snps --include 'INFO/DP>=20' scaffold_filt_OD_PV009_q10_variant.vcf > scaffold_filt_OD_PV009_q10_variant_filtered.vcf
ADD COMMENTlink modified 8 months ago by ATpoint31k • written 8 months ago by apl0002870
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1654 users visited in the last hour