TCGAs Methylation Data Annotation
8.9 years ago
Mdeng ▴ 520

Hello everyone,

I got a question regarding TCGAs Methylation data. I downloaded it via the "Data Matrix", 98 Samples, 49 pairs of Tumor/Normal tissue, all Level 3 and all from Illuminas HumanMethylation450 Chip. Everything is prostate cancer. So, looking at the data, there ~25% of the symbols missing. Therefor I tried to reannotate them, getting the symbols from a public SQL-Server (genome-mysql.cse.ucsc.edu). But most of the positions don't match hg18 (as mentioned at TCGAs wiki: https://wiki.nci.nih.gov/display/TCGA/DNA+methylation), I get matches when using hg19 data base.

Talking numbers: I got XXXX Positions, 119652 don't have a symbol, after reannotation (hg19),

Could it be possible that the data is annotated with hg19 instead of 18?

Also I was asking, how to interpret this "Some data have been masked (including known SNPs)". If some points are masked, in which way they are? Has the position been changed and symbol removed?

More or less the data methylation points with symbols seem to be useless at the moment.

May one has an idea or is experienced with TCGAs methylation data.

8.9 years ago

As long as you have the probeID (starting with "cg"), you can use the .bpm file to get annotations. However, all probes don't necessarily have corresponding gene symbols and/or island annotations, so you should keep that in mind.

I'm pretty sure the above website works. If not, I have a copy of the .bpm file with the demo data for COHCAP:

http://sourceforge.net/projects/cohcap/

Perfect, this is what I was looking for. Thanks you very much!