Question: Identify somatic mutations in cancer exome without a matched pair.
gravatar for rchowdhury
4.4 years ago by
United States
rchowdhury10 wrote:


Does anyone know of an acceptable workflow/pipeline for processing cancer exome data without a matched normal?  I understand that without the normal, we won't be able to distinguish between germline and somatic mutations with 100% certainty.  Even so, I would like to process our data.  I have already prepped my data using the steps outlined GATK best practices (align, sort, remove dups, index, indel realign, base recalibration).  What is the next step (software, analysis, etc)?

Thanks for your help!

sequencing next-gen • 4.1k views
ADD COMMENTlink modified 3.8 years ago by Richard550 • written 4.4 years ago by rchowdhury10
gravatar for Richard
3.8 years ago by
Richard550 wrote:

I have been working on something similar (using genomes).

Here is what I've learned so far:

-This is unfortunately more difficult than I originally hoped.   For example, general knowledge is that variants found in COSMIC will be somatic in other samples, when in practice I've flagged many variants as somatic for that reason, only to find later that they are germline.   One metric that I've found relatively useful when comparing against COSMIC is to limit the trustworthy somatic calls to those that are identified in a minimum number of studies (there are lots that are found in only 1 study).   However, the business of using a database of somatic calls to select the somatic calls from a germline set has not been very successful for me.

-Filtering out all the variants listed in dbSNP 144 (the latest on hg19) is very helpful.   This release now includes data from 1000 genomes as well as ExAC -> all rich germline data sets.   In my experience you need to be careful filtering out all variants seen in ExAC, and its better to not filter some that are at really low frequencies.

-Be careful with the dbSNP filtering.   There are many real somatic variants in there.  For example, it seems all somatic variants found in COLO-829 have been flagged as somatic in dbSNP (using the SAO field).  Unfortunately, somatic variants found outside of published cell lines are not as likely to be marked as somatic in dbSNP.   In fact I did my initial testing using COLO-829 only to learn later that although dbSNP is so precise with its somatic annotations of COLO-829 variants, it it very hit or miss (mostly miss) for somatic variants identified in real cancer samples.

-Be careful with over filtering.   I have found that the germline filtering works relatively well, but there are many cases where a known hotspot mutation (PI3KCA, or BRCA2, for example) is listed in dbSNP and not marked as somatic. 

Throwing everything together I'm able to get about 80% sensitivity and 20% specificity in classifying a set of  (coding) variants as germline or somatic.

ADD COMMENTlink written 3.8 years ago by Richard550

Thanks for these notes, very helpful.

ADD REPLYlink written 3.6 years ago by David Quigley11k

Richard, if one does not have matched normal samples, how do i get significantly mutated genes. Let's say i have somatic SNVs after filtering, then i do i measure which SNVs/genes are significantly mutated. The tool i usd before, genome-music, needs both tumor/normal bam files, to predict significant genes, so was not helpful for me. Do you have some other ideas or tools that you know which predicts significant genes when VCF/MAF files are provided.

ADD REPLYlink written 2.6 years ago by Chirag Nepal2.2k
gravatar for rchowdhury
4.4 years ago by
United States
rchowdhury10 wrote:

FYI, while I was searching for answers I came across this post:

Discrimination Between Germline And Somatic Mutations In Tumor Without The Availability Of The Normal Paired Sample

There is some useful info here.  However, it is from almost 2 years ago.  I am wondering if there is a more updated, streamlined process?


ADD COMMENTlink written 4.4 years ago by rchowdhury10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1645 users visited in the last hour