Question

TCGA DNA methylation data pipeline

1

Entering edit mode

6.4 years ago

yuabrahamliu ▴ 60

Hello everyone,

Maybe my questions are really easy to some experts, but as a new to TCGA data, I indeed feel confused. So if anyone could give me any ideas, I will be appreciated.

I want to do some analysis on the TCGA level-3 DNA methylation data from various cancer types. However, my question is that, did all the cancer types use the same preprocessing pipeline? And which normalization method did TCGA use? If different cancer types used different preprocessing pipelines, is there any good method to let the comparison among them feasible? Thank you so much.

Best, Yu

TCGA DNA methylation pipeline • 4.0k views

ADD COMMENT • link updated 6.4 years ago by Charles Warden 8.3k • written 6.4 years ago by yuabrahamliu ▴ 60

score 1 · Answer 1 · 2019-02-11

That is a good question.

You may sometimes encounter the need to use different normalizations (starting from the raw data), if you get some sort of strange beta distribution (for example, something that is clearly not bimodal). However, unless they processed the chips separately (instead of all together and/or processed with each sample as a group in GenomeStudio), I think a very clear issue with the beta distributions probably would have been identified by now.

So, I know there are some things where the provided processed data gives good results, but I imagine it is possible that subsets of probes may benefit from an alternative normalization (and/or filtering) strategy.

I think there is supposed to be some more information on this page, but I believe there is currently an issue with the link for more details:

https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Methylation_LO_Pipeline/