Annotation of Methylation data in GDC Portal
1
1
Entering edit mode
3.6 years ago

I was specifically looking at rowData of the SummarizedExperiment object that is obtained after downloading from TCGABiolinks package. So in this data.frame, a particular column called 'Feature type' exists. It contains information about S_Shore, N_Shore, CGI, N_Shelf, S_Shelf. However I also see a lot of "." (dots) in this column. Does it imply that belong to Open Sea, since all other categories exist or they care unknown?

Illumina 450K GDC TCGA DNA Methylation • 956 views
2
Entering edit mode
3.6 years ago

They appear to be sites that fall outside of the following classification:

The position of the CpG site in reference to the island:

- Island
- N_Shore or S_Shore (0-2 kb upstream or downstream from CGI)
- N_Shelf or S_Shelf (2-4 kbp upstream or downstream from CGI)


So, if the site is >4kbp from the island, it will be labeled with ".".

[source: https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Methylation_LO_Pipeline/]

0
Entering edit mode

Thanks for the reply. I knew this but often in papers I come across Open Seas so thats why this question.

0
Entering edit mode

I guess that you could call all of those as 'open seas'. I have not seen this definition used widely, but noted it in this publication: Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome.

Provided you also clearly define it in your methods, I would not necessarily see any major issue with it.