Question: Somatic Mutation Identification without Paired Normal Sample
gravatar for haiying.kong
4.0 years ago by
haiying.kong320 wrote:

We have 95 pairs of samples (tumor v.s. normal) that are whole exome sequenced. We identified 5 tumor samples do not have matching normal samples, and there are sample swapping happened.

Can we still use 5 tumor samples assuming that the 5 tumor samples are correctly labeled, but matching normal samples are wrong? Can we use the data from 90 normal samples and get panel of normal and try to identify somatic mutations for the 5 tumor samples based on the panel of normal? If this is possible, how I should proceed?

wes • 2.4k views
ADD COMMENTlink modified 4.0 years ago by Naga450 • written 4.0 years ago by haiying.kong320

You don't have paired samples and there may have been sample mix ups that you can't figure out? Proceed with analysing this data with caution, regardless of the unmatched samples - you just don't know what you're dealing with and that's a dangerous place to be starting an analysis, unsure of your input,

ADD REPLYlink written 4.0 years ago by Daniel Swan13k
gravatar for Naga
4.0 years ago by
Naga450 wrote:

You can still use them, as you mentioned create a local control from the 90 samples, and also use ExAC to remove the common variants (with MAF > 0.01%). The remaining will be the somatic and rare germline variants. In Exome, you will have at least 100-200 germline functional variants with this MAF cut-off and rest will be the somatic variants.

ADD COMMENTlink written 4.0 years ago by Naga450

Could you please tell me which software I can use for this work? Can I still use MuTect2?

ADD REPLYlink written 4.0 years ago by haiying.kong320

You should create a small pipeline to perform this. To make it simple,

  1. download the non-TCGA ExAC data set from the ftp.
  2. Use tabix to get AF for each of the variant in your no-control sample against ExAC vcf file.
  3. Remove the common variants
ADD REPLYlink written 4.0 years ago by Naga450

Sorry, I might be asking stupid questions.

Should I first run GATK tools to identify germline mutations on the 5 tumor samples. In fact the germline mutations identified with GATK tool include both germline and somatic mutations, right?

Then, I can run the steps 1,2,3 in you last post, right?

Then, how can I use the germline mutations from 90 normal samples in our study?

ADD REPLYlink written 4.0 years ago by haiying.kong320

Hi @Naga, Can you please share which column filed in Exac VCF file did you use to filter common varaint with MAF >0.01%. Thanks !

ADD REPLYlink written 3.9 years ago by Chirag Nepal2.2k

@Naga, quoting what you have written : "The remaininig will be the somatic and rare germline variants". Is there a way to differentiate between those two? I am interested in identifying those rare germline mutations.

ADD REPLYlink written 2.8 years ago by lait150
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1716 users visited in the last hour