Annotation of Methylation data in GDC Portal
3.6 years ago

I was specifically looking at rowData of the SummarizedExperiment object that is obtained after downloading from TCGABiolinks package. So in this data.frame, a particular column called 'Feature type' exists. It contains information about S_Shore, N_Shore, CGI, N_Shelf, S_Shelf. However I also see a lot of "." (dots) in this column. Does it imply that belong to Open Sea, since all other categories exist or they care unknown?

3.6 years ago

They appear to be sites that fall outside of the following classification:

The position of the CpG site in reference to the island:

- Island
- N_Shore or S_Shore (0-2 kb upstream or downstream from CGI)
- N_Shelf or S_Shelf (2-4 kbp upstream or downstream from CGI)


So, if the site is >4kbp from the island, it will be labeled with ".".

[source: https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Methylation_LO_Pipeline/]

Thanks for the reply. I knew this but often in papers I come across Open Seas so thats why this question.

I guess that you could call all of those as 'open seas'. I have not seen this definition used widely, but noted it in this publication: Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome.

Provided you also clearly define it in your methods, I would not necessarily see any major issue with it.