Question: Doubts about TCGA DNA Methylation samples
gravatar for carlos_marchi
3.6 years ago by
Brazil, Sorocaba, UFABC
carlos_marchi40 wrote:

Hello everyone,

I've tried to work with the TCGA DNA methylation data, but I'm having problems to understand better these data.

The TCGA's website isn't working anymore to get the biological data. In its place this GDC website ( is working. I was able to get clinical, mRNA and miRNA data in that site, however, I didn't find the DNA methylation data. Isn't there dna methylation data in that portal?

Fortunately, I found another site: cancer genomics browser ( where I was able to get the DNA methylation data for breast cancer (HumanMethylation450).

There are multiple files in the dna methylation file. levels of methylation are in the "genomicmatrix.txt" file, which each sample methylation has beta-value and a probe. On the other hand, the probe.txt file contains correspondence between the probe with the genes. Here is a little example about the genomicMatrixand probe file.


sample TCGA-OL-A66H-01 TCGA-3C-AALK-01 TCGA-AC-A5EH-01 cg13332474 -0.4808 -0.2968 -0.1997
cg00651829 -0.4821 -0.2110 -0.4108
cg17027195 -0.4633 -0.4250 -0.4667
cg09868354 -0.4345 -0.3630 -0.4230
cg03050183 -0.4252 -0.3749 0.1269

cg01989731 NA NA NA
cg06819656 0.4028 0.3047 0.3755 cg04244851 0.4398 0.3894 0.2533

cg19669385 -0.1353 0.3650 0.0664

cg04244855 0.4292 0.4008 0.2468

cg17689707 -0.4842 0.0109 -0.2484
cg04244857 -0.0918 0.2731 -0.0084
cg02434381 -0.4443 -0.4273 -0.4175
cg05777492 -0.4595 -0.4780 -0.4786
cg23340034 0.0933 0.3611 0.4120

cg26361545 0.4339 0.4389 0.4348

cg10609310 0.2913 0.0337 -0.1307

When looking at the file genomicmatrix.txt see several negative and NA values. I thought of disregarding them. Have positive values, I do not find any value above 0.8, ie no hypermethylation values. Why?


id gene chrom chromStart chromEnd strand

cg00035864 TTTY18 chrY 8613009 8613010 .

cg13275322 WAS chrX 48426764 48426765 .

cg13798679 chr1 36390157 36390158 .

cg13799227 chr1 226719204 226719205 .

cg13799302 CYP2J2 chr1 60164980 60164981 .

cg13799671 CD58 chr1 116881090 116881091 .

cg13805052 MORN1,LOC100129534 chr1 2272923 2272924 .

Here I consider only genes that are not on chromosome X and Y. I noticed that there are some probes that associate with more than one gene, in this case thought to obtain the median of methylation values ​​to result in final gene methylation level.

I was thinking of converting these files into a single file with the following header:

gene | beta value | sample CD58| 0.4 | TCGA-OL-A66H-01

Please can someone help me these questions?

Thank you for attention.

dna methylation • 2.1k views
ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by carlos_marchi40
gravatar for zwdzwd
3.6 years ago by
zwdzwd120 wrote:

It turned out one can filter the legacy archive from GDC and get back the DNAme data, e.g.,,%22content%22:%5B%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:%5B%22DNA%20methylation%22%5D%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:%5B%22Methylation%20beta%20value%22%5D%7D%7D%5D%7D

Beta values should be from 0 to 1. My guess is that the genomicMatrix file you are looking at is zero-centered. So the data is from -0.5 to 0.5 with the methylated and unmethylated peaks at around 0.4 and -0.4.

ADD COMMENTlink written 3.6 years ago by zwdzwd120
gravatar for carlos_marchi
3.6 years ago by
Brazil, Sorocaba, UFABC
carlos_marchi40 wrote:


Thank you very much for helping me!

I got the data in the GDC legacy web site. Unfortunately, there aren't many patients in both conditions, e.g. Tumor and Normal. I suppose that there are 40 paired samples and about 300 samples.

About the DNA methylation values, it makes sense the values are between -0.5 and 0.5, but can I apply the rule of three - proportions in these values? Or there is another manner to convert these values to beta values?

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by carlos_marchi40

To the extent the data are interpreted correctly, I would just add 0.5 to each value to get the real beta values... To double check, any female normal sample should have a substantial amount of beta values at around 0.5 after conversion.

ADD REPLYlink written 3.6 years ago by zwdzwd120

To summarize probeset beta values to genes is a complicated scientific question, and I don't think there is a widely acceptable "best" solution.

A simple, useful, but naive, not the best way, is to averaging beta values of all probes that are annotated on that particular gene.

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by Zhenyu Zhang260
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1678 users visited in the last hour