Question

Somatic Mutation Identification without Paired Normal Sample

4

Entering edit mode

7.4 years ago

haiying.kong ▴ 360

We have 95 pairs of samples (tumor v.s. normal) that are whole exome sequenced. We identified 5 tumor samples do not have matching normal samples, and there are sample swapping happened.

Can we still use 5 tumor samples assuming that the 5 tumor samples are correctly labeled, but matching normal samples are wrong? Can we use the data from 90 normal samples and get panel of normal and try to identify somatic mutations for the 5 tumor samples based on the panel of normal? If this is possible, how I should proceed?

WES • 3.7k views

ADD COMMENT • link updated 7.4 years ago by Naga ▴ 450 • written 7.4 years ago by haiying.kong ▴ 360

0

Entering edit mode

You don't have paired samples and there may have been sample mix ups that you can't figure out? Proceed with analysing this data with caution, regardless of the unmatched samples - you just don't know what you're dealing with and that's a dangerous place to be starting an analysis, unsure of your input,

ADD REPLY • link 7.4 years ago by User 59 13k

score 0 · Answer 1 · 2016-12-02

0

Entering edit mode

7.4 years ago

Naga ▴ 450

You can still use them, as you mentioned create a local control from the 90 samples, and also use ExAC to remove the common variants (with MAF > 0.01%). The remaining will be the somatic and rare germline variants. In Exome, you will have at least 100-200 germline functional variants with this MAF cut-off and rest will be the somatic variants.

ADD COMMENT • link 7.4 years ago by Naga ▴ 450

0

Entering edit mode

Could you please tell me which software I can use for this work? Can I still use MuTect2?

ADD REPLY • link 7.4 years ago by haiying.kong ▴ 360

0

Entering edit mode

You should create a small pipeline to perform this. To make it simple,

download the non-TCGA ExAC data set from the ftp.
Use tabix to get AF for each of the variant in your no-control sample against ExAC vcf file.
Remove the common variants

ADD REPLY • link 7.4 years ago by Naga ▴ 450

0

Entering edit mode

Sorry, I might be asking stupid questions.

Should I first run GATK tools to identify germline mutations on the 5 tumor samples. In fact the germline mutations identified with GATK tool include both germline and somatic mutations, right?

Then, I can run the steps 1,2,3 in you last post, right?

Then, how can I use the germline mutations from 90 normal samples in our study?

ADD REPLY • link 7.4 years ago by haiying.kong ▴ 360

0

Entering edit mode

Hi @Naga, Can you please share which column filed in Exac VCF file did you use to filter common varaint with MAF >0.01%. Thanks !

ADD REPLY • link 7.3 years ago by Chirag Nepal ★ 2.4k

0

Entering edit mode

@Naga, quoting what you have written : "The remaininig will be the somatic and rare germline variants". Is there a way to differentiate between those two? I am interested in identifying those rare germline mutations.

ADD REPLY • link 6.2 years ago by lait ▴ 180