TCGA differential DNA methylation analysis 450K best practices
0
0
Entering edit mode
4 days ago
Brassicka • 0

Hello, I am trying to do differential methylation analysis comparing matched tumor versus normal tissue from TCGA-LUAD. Essentially, I have tried doing it three different ways which seemed to give very different results.

For my first go, I downloaded level 3 methylation 450k data from cBioPortal, from TCGA Lung Adenocarcinoma (Firehose Legacy):

https://www.cbioportal.org/study/summary?id=luad_tcga (Accessed 2023/08/29)

I used the Wilcoxon signed-rank test for differential analysis here because I read that it was a little better than limma, and the number of genes in the list was small enough that it would not be computationally intensive: https://pubmed.ncbi.nlm.nih.gov/35292087/

However, I could not find any information on the internet about how the gene-level beta values in cbioportal were arrived at beyond "For genes with multiple methylation probes, the probe most anti-correlated with expression."

Without more explanation I don't know if I trust this, so I wanted to do it from the probe level. I used TCGABiolinks R package to download beta values, and from that point loosely followed this guide:

https://bioconductor.org/packages/release/workflows/vignettes/methylationArrayAnalysis/inst/doc/methylationArrayAnalysis.html

The difference was that I started from beta values rather than iDat files. I filtered the output "DMPs" for promoters and the lowest FDR for each sign of logFC for each gene (which was my own idea and might be flawed).

I aso did DMR analysis as it describes further down in the guide. I also tried to filter these to promoters and the most significant FDR for each sign, as with the DMPs

When I compared the 3 different outputs (basically DMGs, DMPs, and DMRs), there was little agreement (not very many matching genes between the lists, and of the genes that did match, the correlation plots of logFC or mean difference values showed very little to no correlation).

Looking at the literature, and comparing genes that are known to be DMR between tumor and normal in LUAD, it's not clear which of these outputs is the winner. They do tend to have HOX genes in high ranking positions, especially rthe DMR output, which is promising.

Maybe my filtering process is too stringent or flawed and I should be doing it differently. Any opinions? Here are some current thoughts I have:

  • If there is more information anywhere on how cbioportal got their DMGs, I would appreciate if anyone could direct me to it.
  • I think I should try redoing the DMGs from cbioportal using limma instead of the wilcoxon signed rank test so it is more comparable to the others, and try the comparison again.
  • I should do the DMPs and DMRs again without my intuition-based filtering at the end which was my attempt to narrow it down to genes. I should just keep in all the probes within certain effect size bounds and then look at unique gene names at the end.
  • Maybe a way to compare the3 different outputs would be using GO pathway analysis, and whichever one agrees the best with literature is the one I could pick

Anyway, I'm going in circles a bit, does anyone have some experience with DNA methylation who could give a bit of advice?

methylation HM450K differential • 222 views
ADD COMMENT

Login before adding your answer.

Traffic: 1387 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6