RTG (Real Time Genomics) docker image - Problem dealing with different VCFs
0
0
Entering edit mode
3 days ago

Hi!

I’m just getting started in bioinformatics and I’ve been trying to work with RTG Tools, especially the vcfeval function to compare VCFs and generate ROC plots.

I have two VCFs: one provided by Cancer in a Bottle (CIAB), which comes from a very well-documented and carefully filtered DRAGEN pipeline (they also provide the FASTQs).

The other is my own VCF, generated by running their FASTQs through my pipeline with Parabricks Mutect (tumor-only, and I tried to reduce germline variants using gnomAD). Naturally, my VCF has more variants, since CIAB used tumor/normal pairs while mine was tumor-only.

The issue I’m running into is that the two VCFs are not structured the same way:

The columns appear in different orders.

Some of the field names don’t match, even if they contain similar information.

From what I understand in the RTG documentation, vcfeval expects a single consistent field across both files for the evaluation and for generating plots. I’ve tried all sorts of things: copying values into new fields, renaming fields so both VCFs match, but nothing seems to work. vcfeval keeps rejecting them.

I feel like this can’t be such an unusual situation. People must be routinely comparing VCFs from different variant callers, so I doubt the tool isn’t designed for this. It’s probably just my inexperience and lack of understanding of how to properly prepare the files.

Does anyone have suggestions for how to get RTG vcfeval to accept and compare two VCFs that differ in column names and structure?

For reference, I’ve put both VCFs here if anyone is curious to take a look: https://drive.google.com/drive/folders/1CCRbIifa8BwJiFLXVQ2BOJZa_jaiUBxz?usp=sharing

My VCF: HG008_somatic_gnomad_filtered.vcf.gz CIAB VCF: dragen_HG008.norm.vcf.gz

Docker images used:

realtimegenomics/rtg-tools:3.12.1 (4 years old, 349 MB) quay.io/biocontainers/rtg-tools:3.10.1--0 (6 years old, 460 MB)

Thanks a lot for any advice I really appreciate it!

rtg vcf genomic cancer • 406 views
ADD COMMENT
1
Entering edit mode

which columns are giving you trouble?

is the problem that you have two samples in dragen (tumor/normal) and only one sample in the hg008?

ADD REPLY
0
Entering edit mode

Thanks for replying!

Yes, the DRAGEN VCF (dragen_HG008.norm.vcf.gz) is indeed multi-sample, containing both HG008-N (Normal) and HG008-T-mosaic (Tumor).

My VCF (HG008_somatic_gnomad_filtered.vcf.gz), which was generated from a tumor-only Mutect2 run, is single-sample, with the sample simply named sample.

I believe this single-sample vs. multi-sample structure is causing rtg vcfeval to fail, especially when trying to map the correct tumor samples using the --sample flag.

My main challenge is figuring out the standard workflow to prepare these files for a clean, one-to-one comparison between the two tumor samples. Is the recommended approach to first create a new, single-sample VCF from the DRAGEN file containing only the tumor data?

In my VCF (from Mutect2), the quality score I want to use is in the INFO column, under the tag TLOD (Tumor Log Odds).

In the CIAB VCF (from DRAGEN), the equivalent somatic quality score is in the FORMAT column, under the tag SQ (Somatic Quality).

rtg vcfeval --vcf-score-field requires the same field name to exist in both files, so I attempted to harmonize them. I used bcftools to create a new, "harmonized" version of the DRAGEN baseline VCF. In this new file, I successfully copied the FORMAT/SQ value into a new INFO/TLOD field. Now, both my VCF and the new baseline VCF have an identical ##INFO=<ID=TLOD,...> definition in their headers.

However, even when comparing my VCF against this new harmonized baseline, rtg vcfeval (version 3.12.1) still fails with a confusing error, claiming the INFO/TLOD field doesn't exist, even though bcftools query confirms it's present in both files.

ADD REPLY
1
Entering edit mode

Your strategy sounds ok but we would need to see your modified file and the exact command you ran to know what happened.

ADD REPLY

Login before adding your answer.

Traffic: 1777 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6