I am analysing the 450K DNA methylation data from TCGA(GDC). I am new to this analysis and I had a basic doubt. Looking at rowData of the summarized experiment obtained from TCGABiolinks basically at the CpG probe data.frame, there are few things that I find confusing. First a single CpG probe is getting mapped to same gene multiple times that is specified by the Gene_Symbol column. I interpret as these are due to different exons for the gene. But then what should I interpret as the position of the CpG site w.r.t TSS even though that CpG maps to the same gene but each has a different position.
Second there are many CpG probes that map to more than one gene or other elements. Would it be preferable in this case to remove such CpG sites. A count of CpG sites that map to more than 1 gene or other mRNA, yielded more than 100K such probes.
Thanks in advance for any help in this regard.