huge difference when comparing 2 vcf files from the same sample
1
0
Entering edit mode
2.0 years ago
Sara ▴ 240

I have 2 vcf files from 2 different pipelines and I am trying to compare them. To do so I tried 2 things:

  1. I used vcfeval (for SNV and indels separately) to get sensitivity and precision which are quite high for both vcf files
  2. I got the number of events which are common between 2 vcf files (almost 12000) and also the number of unique events for each vcf file. Number of unique events for one files is 2300 and for the other one is 851.

Since the same input file was used for both pipeline and for both of them sensitivity and precision are quite high, how can I interpret the high number of unique events for these files. Since sensitivity and precision are quite high, I do not think those unique events are artifact. How would you interpret such results?

VCF • 826 views
ADD COMMENT
1
Entering edit mode

It is not unexpected to have different results if you run different pipelines. To interpret the differences in results (i.e, the number of unique results), you should probably try to understand the differences between the two pipelines you used. Without knowing what are the pipeline and the tresholds used in the analysis, I doubt anyone here would be able to provide a more specific interpretation.

ADD REPLY
1
Entering edit mode
2.0 years ago

I would not expect too many SNV idiosyncrasies but there are often differences in how indels and complex or composite events are called https://genome.sph.umich.edu/wiki/Variant_Normalization

ADD COMMENT

Login before adding your answer.

Traffic: 2706 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6