Question: Simple integrative DNA methylation and mRNA expression analysis
0
gravatar for Will
10 months ago by
Will0
Will0 wrote:

Hi everyone,

This is my first exposure to bioinformatics, so please bear with me. For a HS assignment, I chose DNA methylation and breast cancer as my topic. I want to explore the correlation between DNA methylation and gene expression in cancer but am overwhelmed with data such as TCGA. Thus I am looking for processed data where I'm able to apply simple statistical analyses but still draw valid conclusions. I saw many mentions of R programming, but I know nothing about this. I came across MExpress and the MENT database but am not sure how to select the right genes etc.

How should I focus my question and/or what logical steps can I take? Please help me find simplified methods.

Any replies or redirect to links would be extremely appreciated. Thank you so much.

ADD COMMENTlink modified 10 months ago by Kevin Blighe48k • written 10 months ago by Will0
3
gravatar for Kevin Blighe
10 months ago by
Kevin Blighe48k
Kevin Blighe48k wrote:

Hey Will,

Yes, working with the TCGA data can be challenging. One of the most widely used programs for using TCGA open access (level 3) data is TCGAbiolinks. You could simply obtain the methylation and expression data using TCGAbiolinks and then work from there. Still requires some initial learning in R, though, which may or may not be part of your assignment.

cBioPortal also contains gene expression data that you can easily download.

However, probably the best option for you is UCSC GDC Xena Hub (home-page), which contains most TCGA data-types in ready-to-download format. For a HC assignment, this should be more than sufficient.

Kevin

ADD COMMENTlink written 10 months ago by Kevin Blighe48k

Thank you so so much!

However, terribly sorry for ignorance, would this produce a trend line? Or a box plot between normal and tumour tissue? Would I be able to apply chi squared tests and such? Will I be able to simply work with Excel with UCSC?

ADD REPLYlink written 10 months ago by Will0
1

Doing it in Excel, you will struggle. I think that you should make the most of the opportunity and aim to do all of this in R Programming Language. If you have never used it before, then you could start with my very simple tutorials, which are currently just Powerpoint: https://github.com/kevinblighe/Rtutorials

ADD REPLYlink written 10 months ago by Kevin Blighe48k

Dear Kevin,

Thanks so much for all your replies and help thus far, but as I am not familiar with the workflow, I'm not completely understanding so would you mind providing clarification. Am I supposed to use UCSC data to obtain methylation and expression value in R, then correlate them? Can I also try to find patterns with clinical data? Would you be able to please refer me to the R packages that would be required for this. I've read the threads here - would COHCAP and MethylMix work?

ADD REPLYlink modified 10 months ago • written 10 months ago by Will0

Hey Will. That is a lot of questions! Do you not have a supervisor or other colleague in your local section/department?

ADD REPLYlink written 10 months ago by Kevin Blighe48k

No, my biology teacher is not specialised in bioinformatics, I knew my project would be independently led. Thanks a lot for your help, Kevin!

ADD REPLYlink modified 10 months ago • written 10 months ago by Will0
1

I see. Sure, MethylMix is a good option and can download the data automatically for you. From MethylMix, you should be able to obtain a matrix of methylation values, which you can then correlate / overlap with your gene expression data. Be sure that your gene expression data follows a binomial distribution, and that you have filtered out genes of low expression values.

You could also build regression models, which I mentioned yesterday in a very old thread: Correlation between methylation (450K) and gene expression (RNA-Seq)

ADD REPLYlink written 10 months ago by Kevin Blighe48k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 709 users visited in the last hour