Question: Multiomics data preprocessing
gravatar for J. Smith
11 months ago by
J. Smith50
J. Smith50 wrote:

I want to perform integrative data analysis using multi omics (RNASeq, Microarray, Mutation, Methylation, Copy Number Variation) data downloaded from The Cancer Genome Atlas (TCGA). I have downloaded data from TCGA using TCGA2STAT R package. I know preprocessing steps (like normalization, log2 transformation, differential expression) for Microarray and RNASeq data. But I don't know the exact workflow of preprocessing steps for Mutation, Methylation, Copy Number Variation data after downloading from TCGA. Please provide some links for such steps.

I have came across the iClusterPlus R package. Examples of iClusterPlus data analysis is available for TCGA glioblastoma data. But exact workflow and code for preprocessing steps are not available.

If anyone can share links for such preprocessing steps for integrative analysis like iClusterPlus, it will be helpful for me.

ADD COMMENTlink modified 11 months ago by Kevin Blighe65k • written 11 months ago by J. Smith50
gravatar for Kevin Blighe
11 months ago by
Kevin Blighe65k
Kevin Blighe65k wrote:


'Multi-omics', like 'systems biology', 'AI', and 'machine learning', has become a sort of 'buzz' term. People hear these terms and get excited, without giving much thought about what they mean. I am not stating that you are doing this in this situation... just setting the scene.

In reality, multi-omic techniques have been around for a long time. Probably the best known is eQTL, whereby gene expression data is essentially regressed with genetic variant (GWAS) data in order to gauge the effect of different variants on the expression of nearby genes.

In your case, you want to use iClusterPlus, Where have you looked for how to use this? There are quite a few use cases in the manual: iClusterPlus: integrative clustering of multiple genomic data sets. If your data is not yet in the correct format, then, my apologies, it is your role to get it [your data] into the correct format. There does not have to be a tutorial for everything.

You should also have a definitive hypothesis (or hypotheses) that you want to test. For example, why do you want to integrate these data? - just for fun or training? People can produce nice looking heatmaps and network plots of data that has been integrated, but most, from what I have seen, are meaningless when taking a clinical perspective. Fair enough if it is entirely an exploratory analysis, though.

I would encourage you to look at the TCGA consortium's published work on endometrial cancer, where they provide for an excellent example of 'intelligent' multi-omics. They essentially defined new sub-types of endometrial cancer based on copy number profiles, and then found that each sub-type also had distinct somatic mutations and methylation profiles. They did not produce any fancy graphs or heatmaps that mean nothing - they just went about the process in an intellectual fashion.


ADD COMMENTlink written 11 months ago by Kevin Blighe65k

Thank you Kevin for your reply. Really I am new to multi omics. Actually, I want to know the different preprocessing steps (in details) for Methylation, Mutation, Copy Number Variations data after downloading from TCGA. Links for basic tutorials for these (along with links for code, if available) will help me.

ADD REPLYlink written 11 months ago by J. Smith50

I see. It may be better to check out TCGAbiolinks and their F1000 published work. They list pre-processing steps for the different data-types there. Sorry, cannot obtain link right now.

ADD REPLYlink written 11 months ago by Kevin Blighe65k

Thanks a lot Kevin...

ADD REPLYlink written 11 months ago by J. Smith50

Thanks a lot again, Kevin... Integrated genomic characterization of endometrial carcinoma is this the paper you are talking about on endometrial cancer?

ADD REPLYlink written 11 months ago by J. Smith50

Oh yes, that is the one - you should read it because it's really great work - multi-omics at its best. I, then, as part of one of my affiliations, re-processed the data but segregated by race: Racial differences in endometrial cancer molecular portraits in The Cancer Genome Atlas. (title could have been better, but was not my choice)

ADD REPLYlink modified 11 months ago • written 11 months ago by Kevin Blighe65k

Thank you Kevin. I will certainly read those papers.

ADD REPLYlink written 11 months ago by J. Smith50
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 695 users visited in the last hour