Question: Concept of Matched normal vs. Virtual Normal
1
gravatar for sutturka
23 days ago by
sutturka100
USA
sutturka100 wrote:

I have a question about the matched normal sample vs virtual normal sample. By definition, a matched normal (MN) is a sample of healthy tissue of the same individual, in order to distinguish germline mutations from somatic mutations. On the other hand, samples from healthy, unrelated individuals serve as a virtual normal (VN) in the absence of associated normal sample.

We are planning to perform whole genome sequencing (WGS) of multiple tumor samples and virtual normal samples (1/3rd the number of tumor samples) with the goal of identification of somatic mutations. However, I see that most of the analysis pipelines (e.g. GATK Mutect) are designed for the analysis of tumor/normal pairs while there are few recent examples (Hiltemann et al., Teer et al.) which describe somatic mutation calling without matching normal (i.e. with virtual normal).

From bioinformatics point of view, Can you please provide recommendations for following:

  1. Is it always recommended to have a matched normal for each tumor? i.e. use the same number of tumor and normal samples for sequencing.
  2. In case of absence of matched normals, It may be best to create the panel of normals (PoN) using the virtual normals to determine the somatic mutations. Is this correct? How many normal samples are required/necessary for considering as PoN?
  3. In case of absence of matched normals, which other bioinformatics workflow do you recommend to accurately call the somatic mutations?
  4. Please suggest any other important considerations for absence of matched normal samples.
ADD COMMENTlink modified 23 days ago by Chris Miller18k • written 23 days ago by sutturka100
1
gravatar for Chris Miller
23 days ago by
Chris Miller18k
Washington University in St. Louis, MO
Chris Miller18k wrote:

Is it always recommended to have a matched normal for each tumor? i.e. use the same number of tumor and normal samples for sequencing.

If you don't have matched normals, you will end up calling lots of patient-specific SNPs as somatic mutations. This is undesirable, but you have to do the best you can with the data you have access to. If you have the budget and access to the material, I highly recommend using matched normals.

In case of absence of matched normals, It may be best to create the panel of normals (PoN) using the virtual normals to determine the somatic mutations. Is this correct? How many normal samples are required/necessary for considering as PoN?

This can help, but will not remove all sites, because everyone has private mutations.

In case of absence of matched normals, which other bioinformatics workflow do you recommend to accurately call the somatic mutations?

Lincoln Stein's group had a nice paper recently where they tackled some of these problems and reached the limits of about how well you can do: https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-017-0446-9

I haven't personally used it, but it seems like a reasonable workflow.

ADD COMMENTlink modified 23 days ago • written 23 days ago by Chris Miller18k

Thank you Chris Miller for the suggestions. I have related question about sample collection strategy.

  1. For WGS, is it sufficient to collect samples from any somatic tissues? i.e. may be lymphoma tumor samples and normals from the skin/blood of the same patient rather than the tissue adjacent to tumor. Will this be the right "matched normal" sample?

  2. For RNASeq, does normal sample need to be from exact same tissue type? i.e. RNAseq should be performed with the different tissue type (skin/blood) from same patient or same tissue type from different healthy individual.

Please share your thoughts.

ADD REPLYlink written 20 days ago by sutturka100
1

1) The only real concern is that the normal should be as free from tumor contamination as possible. Blood is a fine control for most solid tumors, but leukemias are trickier, as you often find tumor contamination in the skin. I think I remember that skin samples from lymphoma patients tend to be free of tumor content, but do a quick lit search to check.

2) For info on normal RNAseq controls, you'll want to consult previous questions like these: A: Why is normal blood used for matched tumor (instead of adjacent norm tissue)? The short answer is that normal RNAseq as controls is rare, because of a) many tissues don't have a way to access good normals (can't scoop out healthy brain) b) matching tissue type well is surprisingly hard

ADD REPLYlink written 20 days ago by Chris Miller18k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1589 users visited in the last hour