Extremely low number of variants in VCF file after filtering MIN(FORMAT/DP)>10
1
0
Entering edit mode
16 months ago
Linda ▴ 60

I'm doing microbiome analysis where I'm looking for SNPs in a large number of microbe species' genomes. I ran my bcftools pipeline on around 15 bacterial and viral species from which the end result produced a number of variants ranging from 0-150 per vcf file.

From looking around it seems 10 is the common benchmark depth filter used in research but after I apply this filter to the vcf files I am left with only one species having 4 variants.

I ran mpileup with a max-depth of 8000 so to maximise the likelihood of finding significant variants.

Could there be something wrong with how I'm running the mpileup or is it due to the depth of sequencing/coverage in the first instance? Why is there such a low depth of reads per sample in the vcf files?

Any insights greatly appreciated! Thanks!

bcftools snp vcf calling • 545 views
0
Entering edit mode

what is the average depth of/coverage for sequencing? Did you check the bam file, in regions of interest, for average coverage? See if you can fine tune parameters of calling variants. Check if trimming is good enough if there is a trimming step involved. Check alignment parameters and if you are using reference genomes, check if you are using correct genomes/references. Make sure that all the steps use same versions/builds of reference genomes/databases.

0
Entering edit mode
16 months ago

Hi Linda,

You should go through each step, starting with the alignment, to see where reads are potentially being lost. Take a look at the alignment metrics (use samtools flagstat), for example, and even check the quality of the FASTQs prior to alignment (use https://www.bioinformatics.babraham.ac.uk/projects/fastqc/).

Also be sure that there were no issues / problems during the preparation of the samples in the 'wet' laboratory.

You can post [to here] the commands that you used, if you wish.

Kevin