Question

Criteria for choosing variants

0

Entering edit mode

2 days ago

ramiro.barrantes ▴ 60

We are doing analysis of tumor/normal pairs where we run 3 variant callers to identify somatic mutations. Sometimes we have two tissues that we compare against the normal, and we usually have RNA expression information.

We would like to choose variants for further investigation, but are thinking of a rationale for choosing and ranking them. Does anybody have any suggestions?

This is what we are thinking. Variants that are called by the three variant callers are prioritized over those that were called by two variant callers, and those in turn are prioritized over variants that were called by a single variant caller. The reason is that we have been reading in the literature that an ensemble approach is superior in terms of decreasing false positives.

However, we are not sure what to make of a variants that appear in two tissues. Should we prioritize those even though the variant was called over a single caller? Any insight?
In addition, if a variant appears as expressed in RNA using RNASeq, does that mean that it is real and we can trust it? I.e how much can we trust a DNA variant called if it also appears as expressed in RNASeq?

variant RNASeq • 1.8k views

ADD COMMENT • link updated 10 hours ago by benformatics 4.2k • written 2 days ago by ramiro.barrantes ▴ 60

0

Entering edit mode

To me knowledge, ensembl approach means that you run several tools and then run some sort of filtration and classification approach to decide whether the callers agree or not, basically a meta-analysis across callers. If you simply take the naive intersect between tools then you probably enrich for variants that are easy to call (because all call them), and deplete difficult regions, which are technically challenging, but can still be biologically relevant. I would check for dedicated ensemble tools which do that if you aim for such an approach. For priority, I would use something like VEP from Ensembl to enrich for variants that cause alterations on coding regions affecting proteins. Depending on your question, the literature will probably guide you as well, as you're likely not the first one to investigate this tumor entity. I would only use the RNA-seq to check whether the affected genes are "expressed" at all, not for variant validation. RNA-seq is noisy due to reverse transcription, potential allelic expression and PCR amplification.

ADD REPLY • link 1 day ago by ATpoint 89k

score 0 · Answer 1 · 2025-09-07

The whole question seems very confusing as if there is a substantial knowledge gap.

Calling variants in tumor-normal pairs is relatively straight-forward.

What is the goal of the analysis? You in all likelihood should just look at the variants that are unique to the tumor and called by all of your variant callers. There is going to be a bunch of other noise - getting lost in the weeds (comparing 2 variant callers vs 1 etc…) is probably a waste of your time. I’m hard pressed to even think that running multiple variant callers is a necessary task unless you are using a newer technology (e.g. long-reads) where gold standard methods aren’t concrete.

In theory all tissues in the body should have the same DNA so yes if the variant is in both tissues that is to be expected… (maybe I’m not understanding this aspect)

The trust related to the RNA-seq: Yes if you see a variant present in both methods it’s more confident. I would worry about variants in the RNA-seq that are not in the DNA-seq, as those are likely noise.