Question: Annotation of Methylation data in GDC Portal
0
noorpratap.singh • 300 wrote:
I was specifically looking at rowData of the SummarizedExperiment object that is obtained after downloading from TCGABiolinks package. So in this data.frame, a particular column called 'Feature type' exists. It contains information about S_Shore, N_Shore, CGI, N_Shelf, S_Shelf. However I also see a lot of "." (dots) in this column. Does it imply that belong to Open Sea, since all other categories exist or they care unknown?
They appear to be sites that fall outside of the following classification:
So, if the site is >4kbp from the island, it will be labeled with ".".
[source: https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Methylation_LO_Pipeline/]
Thanks for the reply. I knew this but often in papers I come across Open Seas so thats why this question.
I guess that you could call all of those as 'open seas'. I have not seen this definition used widely, but noted it in this publication: Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome.
Provided you also clearly define it in your methods, I would not necessarily see any major issue with it.