i would like to ask a more general question for utilizing external databases, for somatic variant calling filtering pipelines. For example, in a lot of scientific publications and in various forums-like here-it is mentioned that if a SNP has an rs number in the dbSNP database (https://www.ncbi.nlm.nih.gov/SNP/), is mainly considered as germline, correct ?
however, in one of our somatic variant filtering pipelines, prior annotating with dbSNP, we have filtered any variants that had a MAF >=0.01 in any of the 4 following different population databases:
1000gp3, gnomad, ESP6500 and ExAC.
Thus,in your opinion, even these variants that remained after the population filtering procedure, and have an rs accession number, still could be considered as "germline" ? or as they are definately rare based on these populations, could be considered as somatic candidates ?
For example, in an interesting publication for a variant calling pipeline, it is mentioned:
"For dbSNP, we used the set of nonflagged variants (flagged variants are those for which SNPs <1% minor allele frequency [MAF; or unknown], mapping only once to reference assembly, or flagged as “clinically associated”)".
Or additionally, the database has extra information that could aid in my understanding ??
Just to add an important point: my extra point for this question, is that i would like for a next stage after obtaining a list of "somatic variants candidates", especially for the SNPs, to perform a subsequent analysis to interrogate these SNPs, for their potential effect on TF binding and motif disruption. Thus, as i need also the rs information, i was wondering if with the above approach, this subset from dbSNP, could be considered as "somatic" candidates.
Thank you in advance,