Question

Identify somatic mutations in cancer exome without a matched pair.

1

Entering edit mode

9.2 years ago

rchowdhury ▴ 10

Hello,

Does anyone know of an acceptable workflow/pipeline for processing cancer exome data without a matched normal? I understand that without the normal, we won't be able to distinguish between germline and somatic mutations with 100% certainty. Even so, I would like to process our data. I have already prepped my data using the steps outlined GATK best practices (align, sort, remove dups, index, indel realign, base recalibration). What is the next step (software, analysis, etc)?

Thanks for your help!

next-gen-sequencing • 5.6k views

ADD COMMENT • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by rchowdhury ▴ 10

Ram · Answer 1 · 2015-09-01

I have been working on something similar (using genomes).

Here is what I've learned so far:

This is unfortunately more difficult than I originally hoped. For example, general knowledge is that variants found in COSMIC will be somatic in other samples, when in practice I've flagged many variants as somatic for that reason, only to find later that they are germline. One metric that I've found relatively useful when comparing against COSMIC is to limit the trustworthy somatic calls to those that are identified in a minimum number of studies (there are lots that are found in only 1 study). However, the business of using a database of somatic calls to select the somatic calls from a germline set has not been very successful for me.
Filtering out all the variants listed in dbSNP 144 (the latest on hg19) is very helpful. This release now includes data from 1000 genomes as well as ExAC -> all rich germline data sets. In my experience you need to be careful filtering out all variants seen in ExAC, and its better to not filter some that are at really low frequencies.
Be careful with the dbSNP filtering. There are many real somatic variants in there. For example, it seems all somatic variants found in COLO-829 have been flagged as somatic in dbSNP (using the SAO field). Unfortunately, somatic variants found outside of published cell lines are not as likely to be marked as somatic in dbSNP. In fact I did my initial testing using COLO-829 only to learn later that although dbSNP is so precise with its somatic annotations of COLO-829 variants, it it very hit or miss (mostly miss) for somatic variants identified in real cancer samples.
Be careful with over filtering. I have found that the germline filtering works relatively well, but there are many cases where a known hotspot mutation (PI3KCA, or BRCA2, for example) is listed in dbSNP and not marked as somatic.

Throwing everything together I'm able to get about 80% sensitivity and 20% specificity in classifying a set of (coding) variants as germline or somatic.

Ram · Answer 2 · 2015-02-06

0

Entering edit mode

9.2 years ago

rchowdhury ▴ 10

FYI, while I was searching for answers I came across this post.

There is some useful info here. However, it is from almost 2 years ago. I am wondering if there is a more updated, streamlined process?

Thanks.

ADD COMMENT • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by rchowdhury ▴ 10