I have a question about the matched normal sample vs virtual normal sample. By definition, a matched normal (MN) is a sample of healthy tissue of the same individual, in order to distinguish germline mutations from somatic mutations. On the other hand, samples from healthy, unrelated individuals serve as a virtual normal (VN) in the absence of associated normal sample.
We are planning to perform whole genome sequencing (WGS) of multiple tumor samples and virtual normal samples (1/3rd the number of tumor samples) with the goal of identification of somatic mutations. However, I see that most of the analysis pipelines (e.g. GATK Mutect) are designed for the analysis of tumor/normal pairs while there are few recent examples (Hiltemann et al., Teer et al.) which describe somatic mutation calling without matching normal (i.e. with virtual normal).
From bioinformatics point of view, Can you please provide recommendations for following:
- Is it always recommended to have a matched normal for each tumor? i.e. use the same number of tumor and normal samples for sequencing.
- In case of absence of matched normals, It may be best to create the panel of normals (PoN) using the virtual normals to determine the somatic mutations. Is this correct? How many normal samples are required/necessary for considering as PoN?
- In case of absence of matched normals, which other bioinformatics workflow do you recommend to accurately call the somatic mutations?
- Please suggest any other important considerations for absence of matched normal samples.