Question

Strategies to call variants from a cancer sample

1

Entering edit mode

6.4 years ago

John Ma ▴ 310

Hi all,

I am looking to identify DNA-level variations from a matched tumor-normal WES data. Specifically, I just want to know the variations in the tumor sample in relate to the reference genome, not the normal sample.

I have noticed two possible approaches here:

Simply use a germline-variant caller to call variants from the tumor sample, or
Call, separately, germline variations between reference and normal, and somatic variations between tumor and normal. The two callsets are then combined.

I'm well aware of the ploidy issues surrounding tumor samples and thus somatic callers are always separate algorithms. However, which is the better approach for my purpose?

Thanks in advance!

SNP next-gen • 2.2k views

ADD COMMENT • link updated 6.4 years ago by d-cameron ★ 2.9k • written 6.4 years ago by John Ma ▴ 310

0

Entering edit mode

which is the better approach for my purpose?

What is your goal? Are you trying to identify germline variants?

ADD REPLY • link 6.4 years ago by igor 13k

0

Entering edit mode

Hi @igor, the intention is to identify all variants regardless of source.

ADD REPLY • link 6.4 years ago by John Ma ▴ 310

0

Entering edit mode

I would say somatic and germline variants are completely different analysis, so you can't really combine them.

For germline, you can call them in the N and use T as the validation sample.

TCGA PDAC is a nice paper where they discuss a lot of somatic and germline variants side by side: https://www.ncbi.nlm.nih.gov/pubmed/28810144

ADD REPLY • link 6.4 years ago by igor 13k

0

Entering edit mode

I understand this. The purpose of the variant calling in question, however, is not to study their biological significance. I plan to do some RNA-editing research, so I need to identify DNA-level variations to act blacklist positions.

ADD REPLY • link 6.4 years ago by John Ma ▴ 310

score 1 · Answer 1 · 2017-11-29

1

Entering edit mode

6.4 years ago

d-cameron ★ 2.9k

I have noticed two possible approaches here:

I recommend approach 3: use a somatic caller to identify all variants from the tumour/normal pair then remove calls that appear only in the normal (e.g. somatic LOH, somatic reversion to germline).

All somatic callers I have used identify both germ-line and somatic mutations. Filtering this call set to only include variants with support in the tumour will give you all variants in your tumour. The reason why doing joint tumour/normal variant calling performs better than just tumour-reference calling is that, since the samples are related, you have greater coverage of germline variants. If a germ-line variant by chance happens to have borderline low coverage in the tumour, and is missed by a tumour-reference caller, it can still be called by a somatic caller due to the support from the normal indicating the presence of a germ-line variant at that position.

ADD COMMENT • link 6.4 years ago by d-cameron ★ 2.9k

0

Entering edit mode

Well, the somatic callers I used (MuTect, MuTect2, VarScan, MuSE) do not call somatic reversions, which I hope they could.

That aside, using only default parameters and nothing else on a particular pair I'm currently working, MuTect gives less than 2,000 variants and MuSE gives less than 500. I'd be surprised to see there're only ~3,000 variants (I will run VarScan today) between a cancer WES sample and the reference.

ADD REPLY • link 6.4 years ago by John Ma ▴ 310

1

Entering edit mode

You might like to try the rtg somatic command from RTG Core. It uses a Bayesian model to jointly call the normal and tumor, including options to output both germline variants as well as gain-of-reference (potential somatic reversion) calls. As well as the primary score fields produced by the Bayesian model, it includes a machine-learning derived quality score (AVR) that incorporates factors not considered by the Bayesian model, which is very useful for discriminating false positive calls.

ADD REPLY • link 6.4 years ago by Len Trigg ★ 1.6k