I plan to perform somatic analysis on 10 different tumor BAM files to determine the SNPs, indels, and CNVs. Is it necessary to have 10 distinct normal files corresponding to each tumor BAM file, or can I use a single normal file for all 10 BAM files?
If I don't have a normal file and still want to perform somatic analysis, is there a solution to do it without requiring a normal sample?
The normal sample is used to know the expected healthy genotype, to distinguish germline from somatic variants, to filter artifacts, and so on. GATK team has a detailed explanation here. If your tumor samples come from different organisms, then yes, having 10 corresponding normal files will definitively increase the confidence of the variant calling, as the process will permit to have matched calls (tumor-normal) and to filter for technical issues (building a so called Panel of Normals).
While it's possible to perform somatic variant calling without a matched normal, for example using Mutect2 in tumor-only mode, this is discouraged. Tumor cells could be subjected to many alterations, leading to a strange reads distribution across variants. You'll see for sure an higher number of variants called, as many of them will likely be misclassified germline variants and/or technical biases which are nearly impossible to distinguish from true signals.
Shred Thank you for your detailed response regarding somatic analysis and the use of normal samples. I now understand the importance of having corresponding normal samples to improve confidence in the variant calling process. Your explanation about using tumor-normal matches to filter out technical issues and establish a reference for the healthy genotype is very helpful. I appreciate your recommendation not to perform somatic variant calling without a corresponding normal sample, as it could lead to misclassification of germline variants and technical biases.
Shred Are there any other tools available for somatic CNV(Copy Number Variants) variant detection that do not require a normal file? I notice that Mutect2 detects SNPs and Indels, but I do not see any support for CNVs!
Mutect2 isn't for CNV. Are you working with WES or WGS?
I'm working with WES
CNVkit gives the possibility to use only tumor samples. You'll need a pool of other (normal) samples sequenced with the same protocol for referencing.