Question: Strategies to call variants from a cancer sample
gravatar for MMa
2.6 years ago by
MMa280 wrote:

Hi all,

I am looking to identify DNA-level variations from a matched tumor-normal WES data. Specifically, I just want to know the variations in the tumor sample in relate to the reference genome, not the normal sample.

I have noticed two possible approaches here:

  1. Simply use a germline-variant caller to call variants from the tumor sample, or
  2. Call, separately, germline variations between reference and normal, and somatic variations between tumor and normal. The two callsets are then combined.

I'm well aware of the ploidy issues surrounding tumor samples and thus somatic callers are always separate algorithms. However, which is the better approach for my purpose?

Thanks in advance!

snp next-gen • 1.1k views
ADD COMMENTlink modified 2.6 years ago by d-cameron2.2k • written 2.6 years ago by MMa280

which is the better approach for my purpose?

What is your goal? Are you trying to identify germline variants?

ADD REPLYlink written 2.6 years ago by igor11k

Hi @igor, the intention is to identify all variants regardless of source.

ADD REPLYlink written 2.6 years ago by MMa280

I would say somatic and germline variants are completely different analysis, so you can't really combine them.

For germline, you can call them in the N and use T as the validation sample.

TCGA PDAC is a nice paper where they discuss a lot of somatic and germline variants side by side:

ADD REPLYlink written 2.6 years ago by igor11k

I understand this. The purpose of the variant calling in question, however, is not to study their biological significance. I plan to do some RNA-editing research, so I need to identify DNA-level variations to act blacklist positions.

ADD REPLYlink written 2.6 years ago by MMa280
gravatar for d-cameron
2.6 years ago by
d-cameron2.2k wrote:

I have noticed two possible approaches here:

I recommend approach 3: use a somatic caller to identify all variants from the tumour/normal pair then remove calls that appear only in the normal (e.g. somatic LOH, somatic reversion to germline).

All somatic callers I have used identify both germ-line and somatic mutations. Filtering this call set to only include variants with support in the tumour will give you all variants in your tumour. The reason why doing joint tumour/normal variant calling performs better than just tumour-reference calling is that, since the samples are related, you have greater coverage of germline variants. If a germ-line variant by chance happens to have borderline low coverage in the tumour, and is missed by a tumour-reference caller, it can still be called by a somatic caller due to the support from the normal indicating the presence of a germ-line variant at that position.

ADD COMMENTlink written 2.6 years ago by d-cameron2.2k

Well, the somatic callers I used (MuTect, MuTect2, VarScan, MuSE) do not call somatic reversions, which I hope they could.

That aside, using only default parameters and nothing else on a particular pair I'm currently working, MuTect gives less than 2,000 variants and MuSE gives less than 500. I'd be surprised to see there're only ~3,000 variants (I will run VarScan today) between a cancer WES sample and the reference.

ADD REPLYlink written 2.6 years ago by MMa280

You might like to try the rtg somatic command from RTG Core. It uses a Bayesian model to jointly call the normal and tumor, including options to output both germline variants as well as gain-of-reference (potential somatic reversion) calls. As well as the primary score fields produced by the Bayesian model, it includes a machine-learning derived quality score (AVR) that incorporates factors not considered by the Bayesian model, which is very useful for discriminating false positive calls.

ADD REPLYlink written 2.6 years ago by Len Trigg1.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1397 users visited in the last hour