Question: RTG vcfeval: number of variants reported by vcfeval is not equal to actual number of variants.
1
gravatar for c.clarido
8 months ago by
c.clarido70
Netherlands/Rotterdam/Leiden University (Applied Science)
c.clarido70 wrote:

Hello community,

I am running the rtg-tool vcfeval as follow:

rtg vcfeval  
--baseline baseSnps.vcf.gz \
--calls calledSnps.vcf.gz \
--output rtg-results \
--template ref.sdf \
--sample "Macrogen-48-0511,48-0511" \
--output-mode "split"

The ref.sdf is generated from GRCh37. Both VCF files were based on this build.

I noticed however, that the number of variants reported by this tool is a lot less that the actual number of variants found in both vcf files.

baseSnps.vcf.gz contains 378285 variants calledSnps.vcf.gz contains 1125224 variants

However, looking at the summary:

Threshold  True-pos-baseline  True-pos-call  False-pos  False-neg  Precision  Sensitivity  F-measure
----------------------------------------------------------------------------------------------------
     None            3508339        3509333     525719      52825     0.8697       0.9852     0.9238

TPbase + FN = 3508339 + 52825 = 3561164
TPcall + FP = 3509333 + 525719 = 4035052

Is there something, I'm missing?

hg19 rtg validation vcfeval • 680 views
ADD COMMENTlink modified 6 months ago by Len Trigg1.5k • written 8 months ago by c.clarido70

I would contact Len Trigg and RTG directly. He is a user on Biostars but does not appear that often. https://www.realtimegenomics.com/company/len-trigg

Kevin

ADD REPLYlink written 8 months ago by Kevin Blighe69k
2
gravatar for Len Trigg
6 months ago by
Len Trigg1.5k
New Zealand
Len Trigg1.5k wrote:

By default, vcfeval excludes variants that are not PASS or . in the VCF FILTER column (you can use --all-records if you want to include these).

There are other variant classes that are out-of-scope for the matching process, such as SVs, very long indels, non-variant records (i.e. GT is 0/0), etc.

If you run vcfeval in annotation mode (--output-mode annotate), the output VCFs will contain every record present on the input, with additional annotation fields indicating status, so you could see specifically which variants are being ignored.

ADD COMMENTlink written 6 months ago by Len Trigg1.5k
1

Thanks for stopping by Len. Trust that all is well.

ADD REPLYlink written 6 months ago by Kevin Blighe69k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2586 users visited in the last hour
_