Question

Multiomics data preprocessing

1

Entering edit mode

4.6 years ago

J. Smith ▴ 80

I want to perform integrative data analysis using multi omics (RNASeq, Microarray, Mutation, Methylation, Copy Number Variation) data downloaded from The Cancer Genome Atlas (TCGA). I have downloaded data from TCGA using TCGA2STAT R package. I know preprocessing steps (like normalization, log2 transformation, differential expression) for Microarray and RNASeq data. But I don't know the exact workflow of preprocessing steps for Mutation, Methylation, Copy Number Variation data after downloading from TCGA. Please provide some links for such steps.

I have came across the iClusterPlus R package. Examples of iClusterPlus data analysis is available for TCGA glioblastoma data. But exact workflow and code for preprocessing steps are not available.

If anyone can share links for such preprocessing steps for integrative analysis like iClusterPlus, it will be helpful for me.

multi omics preprocessing iClusterPlus • 2.5k views

ADD COMMENT • link updated 4.6 years ago by Kevin Blighe 87k • written 4.6 years ago by J. Smith ▴ 80

score 0 · Answer 1 · 2019-09-28

0

Entering edit mode

4.6 years ago

Kevin Blighe 87k

Hey,

'Multi-omics', like 'systems biology', 'AI', and 'machine learning', has become a sort of 'buzz' term. People hear these terms and get excited, without giving much thought about what they mean. I am not stating that you are doing this in this situation... just setting the scene.

In reality, multi-omic techniques have been around for a long time. Probably the best known is eQTL, whereby gene expression data is essentially regressed with genetic variant (GWAS) data in order to gauge the effect of different variants on the expression of nearby genes.

In your case, you want to use iClusterPlus, Where have you looked for how to use this? There are quite a few use cases in the manual: iClusterPlus: integrative clustering of multiple genomic data sets. If your data is not yet in the correct format, then, my apologies, it is your role to get it [your data] into the correct format. There does not have to be a tutorial for everything.

You should also have a definitive hypothesis (or hypotheses) that you want to test. For example, why do you want to integrate these data? - just for fun or training? People can produce nice looking heatmaps and network plots of data that has been integrated, but most, from what I have seen, are meaningless when taking a clinical perspective. Fair enough if it is entirely an exploratory analysis, though.

I would encourage you to look at the TCGA consortium's published work on endometrial cancer, where they provide for an excellent example of 'intelligent' multi-omics. They essentially defined new sub-types of endometrial cancer based on copy number profiles, and then found that each sub-type also had distinct somatic mutations and methylation profiles. They did not produce any fancy graphs or heatmaps that mean nothing - they just went about the process in an intellectual fashion.

Kevin

ADD COMMENT • link 4.6 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you Kevin for your reply. Really I am new to multi omics. Actually, I want to know the different preprocessing steps (in details) for Methylation, Mutation, Copy Number Variations data after downloading from TCGA. Links for basic tutorials for these (along with links for code, if available) will help me.

ADD REPLY • link 4.6 years ago by J. Smith ▴ 80

1

Entering edit mode

I see. It may be better to check out TCGAbiolinks and their F1000 published work. They list pre-processing steps for the different data-types there. Sorry, cannot obtain link right now.

ADD REPLY • link 4.6 years ago by Kevin Blighe 87k

0

Entering edit mode

Thanks a lot Kevin...

ADD REPLY • link 4.6 years ago by J. Smith ▴ 80

1

Entering edit mode

TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages

ADD REPLY • link 4.6 years ago by Kevin Blighe 87k

0

Entering edit mode

Thanks a lot again, Kevin... Integrated genomic characterization of endometrial carcinoma is this the paper you are talking about on endometrial cancer?

ADD REPLY • link 4.6 years ago by J. Smith ▴ 80

0

Entering edit mode

Oh yes, that is the one - you should read it because it's really great work - multi-omics at its best. I, then, as part of one of my affiliations, re-processed the data but segregated by race: Racial differences in endometrial cancer molecular portraits in The Cancer Genome Atlas. (title could have been better, but was not my choice)