vcftools discrepancy in mean depth values per site
1
0
Entering edit mode
3.8 years ago
oscarmirap ▴ 10

Dear all,

I'm using VCFtools 0.1.17 to filter out loci with low coverage, this is the process:

  1. Use --site-mean-depth to get a "table.ldepth.mean" with the mean values per site.
  2. Calculate manually how many loci there are above different mean depth values, and choose a minimum depth value.
  3. Use --min-meanDP to filter out the loci below the chosen value.

Problem is:

The final vcf file after pruning low depth sites always has fewer loci than expected, when calculated from the table.

Example: If I check how many loci there are with MEAN_DEPTH above 4 in the table there is 5926; but if I prune loci using "--min-meanDP 4" there are only 2157 loci left in the output file. And there should be more, because it should not only include loci above 4, but also equal to 4.

Tried this with many different data, and combinations of values for --min-meanDP and --max-meanDP, and I always get way fewer loci in the output file than if I calculate their number manually from the table.

After some digging I found out, the problem are the missing values, looks like ----site-mean-depth ignores missing values, while --min-meanDP considers that they have DP=0 (which makes more sense).

If the vcf file has no missing genotypes, the numbers match.

Someone else has this problem?

Is there an alternative to calculate mean depth per locus that will consider missing as 0?

Thanks!

vcftools vcf SNP depth coverage • 2.5k views
ADD COMMENT
2
Entering edit mode
3.8 years ago

Someone else has this problem?

These problems are common, believe it or not. However, these programs come with a vast amount of parameter configurations - I can understand why the VCFtools developer would encode one different from the other at design time.

A missing / NA value at a particular site may not necessarily reflect 0 - depending on the variant caller, it could simply mean that not enough information was available to make a call, but not necessarily indicating 0 reads.

You may want to switch over to BCFtools, by the way. VCFtools is not maintained anymore, as far as I am aware.

Anyway, it is for these discrepancies that I never use these pre-programmed functions of these tools. I became comfortable with AWK and general shell scripting for passing VCFs, and this allows me to do pretty much everything that I need.

Kevin

ADD COMMENT
1
Entering edit mode

Thanks for the heads up on "missing", Kevin, you are right. Yes, before seeing your answer we were discussing already that it will be better to write our own script, in perl in my case. Cheers!

Óscar

ADD REPLY

Login before adding your answer.

Traffic: 2218 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6