Identify somatic mutations in cancer exome without a matched pair.
2
1
Entering edit mode
9.2 years ago
rchowdhury ▴ 10

Hello,

Does anyone know of an acceptable workflow/pipeline for processing cancer exome data without a matched normal? I understand that without the normal, we won't be able to distinguish between germline and somatic mutations with 100% certainty. Even so, I would like to process our data. I have already prepped my data using the steps outlined GATK best practices (align, sort, remove dups, index, indel realign, base recalibration). What is the next step (software, analysis, etc)?

Thanks for your help!

next-gen-sequencing • 5.6k views
ADD COMMENT
5
Entering edit mode
8.6 years ago
Richard ▴ 590

I have been working on something similar (using genomes).

Here is what I've learned so far:

  • This is unfortunately more difficult than I originally hoped. For example, general knowledge is that variants found in COSMIC will be somatic in other samples, when in practice I've flagged many variants as somatic for that reason, only to find later that they are germline. One metric that I've found relatively useful when comparing against COSMIC is to limit the trustworthy somatic calls to those that are identified in a minimum number of studies (there are lots that are found in only 1 study). However, the business of using a database of somatic calls to select the somatic calls from a germline set has not been very successful for me.
  • Filtering out all the variants listed in dbSNP 144 (the latest on hg19) is very helpful. This release now includes data from 1000 genomes as well as ExAC -> all rich germline data sets. In my experience you need to be careful filtering out all variants seen in ExAC, and its better to not filter some that are at really low frequencies.
  • Be careful with the dbSNP filtering. There are many real somatic variants in there. For example, it seems all somatic variants found in COLO-829 have been flagged as somatic in dbSNP (using the SAO field). Unfortunately, somatic variants found outside of published cell lines are not as likely to be marked as somatic in dbSNP. In fact I did my initial testing using COLO-829 only to learn later that although dbSNP is so precise with its somatic annotations of COLO-829 variants, it it very hit or miss (mostly miss) for somatic variants identified in real cancer samples.
  • Be careful with over filtering. I have found that the germline filtering works relatively well, but there are many cases where a known hotspot mutation (PI3KCA, or BRCA2, for example) is listed in dbSNP and not marked as somatic.

Throwing everything together I'm able to get about 80% sensitivity and 20% specificity in classifying a set of (coding) variants as germline or somatic.

ADD COMMENT
0
Entering edit mode

Thanks for these notes, very helpful.

ADD REPLY
0
Entering edit mode

Richard, if one does not have matched normal samples, how do I get significantly mutated genes. Let's say I have somatic SNVs after filtering, then I do I measure which SNVs/genes are significantly mutated. The tool I usd before, genome-music, needs both tumor/normal bam files, to predict significant genes, so was not helpful for me. Do you have some other ideas or tools that you know which predicts significant genes when VCF/MAF files are provided.

ADD REPLY
0
Entering edit mode
9.2 years ago
rchowdhury ▴ 10

FYI, while I was searching for answers I came across this post.

There is some useful info here. However, it is from almost 2 years ago. I am wondering if there is a more updated, streamlined process?

Thanks.

ADD COMMENT

Login before adding your answer.

Traffic: 2221 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6