DEG analysis of METABRIC data from cBIOPORTAL
1
0
Entering edit mode
2.7 years ago
rGun ▴ 10

I downloaded expression (microarray), clinical and mutation data associated with METABRIC breast cancer study from cBIOPORTAL and I'm planning to analyze the dataset for DEGs among three subsets of individuals, subsetted based on a mutational status. I'm not familiar with using median and z-score expression values for downstream analysis. I was initially planning to set up a SummarizedExperiment object and use DESeq2 to analyze the data. Since these expression values are already normalized, I believe can't go down that path. Can I just run ANOVA (Kruskal-Wallis test) on expression values for each gene from three samples/subsets?

Any help on understanding what these median and z-score expression files exactly are, and how I may proceed with the analysis is much appreciated.

R cBioPortal Breast Cancer DEG analysis • 2.0k views
0
Entering edit mode

Thanks, Kevin. I checked the distribution of data for each test category and they looked normally distributed.I did ANOVA and Kruskal-Wallis ANOVA on z-scores. Just wanted to make sure that I didn’t missing any. Also, thanks again for your answer on implementing K-W and parametric ANOVA in R. It was very helpful.

0
Entering edit mode

Sure thing.

0
Entering edit mode

Any idea on what the median data file that comes along with the download? Thanks!

0
Entering edit mode

Can you clarify what the file-name is?

0
Entering edit mode

It says data_expression_median.txt and a representative summary of data distribution for a sample looks like below;

Min. 1st Qu. Median Mean 3rd Qu. Max. 4.713 5.401 5.659 6.423 7.095 14.464

whereas for the same sample in the z-score file (data_mRNA_median_Zscores.txt) summary looks like below. Min. 1st Qu. Median Mean 3rd Qu. Max. NA's -4.1048 -0.6847 -0.0855 -0.0321 0.6074 11.4517 1006

1
Entering edit mode
2.7 years ago

If you are downloading the Z-score expression values, then —yes— you can do parametric / non-parametric tests on these, such as ANOVA / Kruskal-Wallis ANOVA. You could feasibly use limma, too, as it just fits a weighted linear regression model to your data.

I would not use DESeq2, as it would expect raw counts as input. Even if you coerce your data to a DESeq2 object, I believe that the model assumptions employed by DESeq2 for your data will still be incorrect.

Kevin

0
Entering edit mode

Is there any way to get the raw counts of metabric data. ?

0
Entering edit mode

I am not sure - have you searched?