TCGA Methylation Data and Gene Mapping
10 weeks ago
James ▴ 30

I am looking into the TCGA Methylation data and I wanted to understand how to parse the data, and, ideally, map measured beta values to single Hugo symbols.

My issues are as follows:

1) For some of the Stable Entity IDs there are multiple gene names listed, for example in the breast cancer (BRCA) data there is a row with values:

Stable Entity ID | Name | Description | Transcript ID

"cg00008493 | KIAA1409;COX8C | Body;5'UTR | NM_020818;NM_182971 |

2) Many Stable Entity IDs map to the same gene, for example, in the attached image, multiple Stable Entity IDs map to the same gene (DLX5) DLX5

For a research project I'd love to associate each gene to a specific methylation value. Put differently, for each patient I want to create a vector where each entry corresponds to a methylation value for a given gene. Is there a principled way to do this?

Methylation Cancer TCGA • 341 views
10 weeks ago
Basti ★ 1.5k

CpGs may be annotated to more than >1 gene simply because gene regions overlap on the genome.

If you want to associate each gene to a methylation value, you could take the average methylation of all CpGs for each gene. I am personally not convinced it would be a useful information because not all CpGs have a functional implication across a single gene and most of them are stable between individuals, and you will likely obtain the same mean % of methylation for all individuals.


