Question

Concept of Matched normal vs. Virtual Normal

3

Entering edit mode

6.6 years ago

sutturka ▴ 190

I have a question about the matched normal sample vs virtual normal sample. By definition, a matched normal (MN) is a sample of healthy tissue of the same individual, in order to distinguish germline mutations from somatic mutations. On the other hand, samples from healthy, unrelated individuals serve as a virtual normal (VN) in the absence of associated normal sample.

We are planning to perform whole genome sequencing (WGS) of multiple tumor samples and virtual normal samples (1/3rd the number of tumor samples) with the goal of identification of somatic mutations. However, I see that most of the analysis pipelines (e.g. GATK Mutect) are designed for the analysis of tumor/normal pairs while there are few recent examples (Hiltemann et al., Teer et al.) which describe somatic mutation calling without matching normal (i.e. with virtual normal).

From bioinformatics point of view, Can you please provide recommendations for following:

Is it always recommended to have a matched normal for each tumor? i.e. use the same number of tumor and normal samples for sequencing.
In case of absence of matched normals, It may be best to create the panel of normals (PoN) using the virtual normals to determine the somatic mutations. Is this correct? How many normal samples are required/necessary for considering as PoN?
In case of absence of matched normals, which other bioinformatics workflow do you recommend to accurately call the somatic mutations?
Please suggest any other important considerations for absence of matched normal samples.

Somatic mutations Virtual normal matched normal • 4.5k views

ADD COMMENT • link updated 6.6 years ago by Chris Miller 22k • written 6.6 years ago by sutturka ▴ 190

score 4 · Answer 1 · 2017-09-26

4

Entering edit mode

6.6 years ago

Chris Miller 22k

Is it always recommended to have a matched normal for each tumor? i.e. use the same number of tumor and normal samples for sequencing.

If you don't have matched normals, you will end up calling lots of patient-specific SNPs as somatic mutations. This is undesirable, but you have to do the best you can with the data you have access to. If you have the budget and access to the material, I highly recommend using matched normals.

In case of absence of matched normals, It may be best to create the panel of normals (PoN) using the virtual normals to determine the somatic mutations. Is this correct? How many normal samples are required/necessary for considering as PoN?

This can help, but will not remove all sites, because everyone has private mutations.

In case of absence of matched normals, which other bioinformatics workflow do you recommend to accurately call the somatic mutations?

Lincoln Stein's group had a nice paper recently where they tackled some of these problems and reached the limits of about how well you can do: https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-017-0446-9

I haven't personally used it, but it seems like a reasonable workflow.

ADD COMMENT • link 6.6 years ago by Chris Miller 22k

0

Entering edit mode

Thank you Chris Miller for the suggestions. I have related question about sample collection strategy.

For WGS, is it sufficient to collect samples from any somatic tissues? i.e. may be lymphoma tumor samples and normals from the skin/blood of the same patient rather than the tissue adjacent to tumor. Will this be the right "matched normal" sample?
For RNASeq, does normal sample need to be from exact same tissue type? i.e. RNAseq should be performed with the different tissue type (skin/blood) from same patient or same tissue type from different healthy individual.

Please share your thoughts.

ADD REPLY • link 6.6 years ago by sutturka ▴ 190

1

Entering edit mode

1) The only real concern is that the normal should be as free from tumor contamination as possible. Blood is a fine control for most solid tumors, but leukemias are trickier, as you often find tumor contamination in the skin. I think I remember that skin samples from lymphoma patients tend to be free of tumor content, but do a quick lit search to check.

2) For info on normal RNAseq controls, you'll want to consult previous questions like these: A: Why is normal blood used for matched tumor (instead of adjacent norm tissue)? The short answer is that normal RNAseq as controls is rare, because of a) many tissues don't have a way to access good normals (can't scoop out healthy brain) b) matching tissue type well is surprisingly hard

ADD REPLY • link 6.6 years ago by Chris Miller 22k