Entering edit mode

4.3 years ago

lenC_biotecLover
▴
90

I have a matrix with different miRNA RPKM values downloaded from TCGA, relatively to different TCGA projects (BRCA, LAML, LUAD ecc.) columns: TCGA-barcodes, rows: miRNa identifier.

In order to perform a machine learning analysis how can I normalize all this data between the patients in my matrix? I searched all around the web but I couldn't find any answer.

I'm really a novice in bioinformatics and computational biology, and any advice is strongly appreciated. Thank you very much.

I know, but I meant between the patients, considering that I've data from different projects

You can convert rpkm to

`log scale`

and perform`vst`

Thank you, after this, when I have the vst normalized data (using the DEseq2 package, isn't it?), it is the same of having counts data transformed using the same

`vst`

function?. For instance, if I have a RPKM dataset converted using first`log scale`

then`vst`

and also a counts dataset normalized with the`vst`

function, are they comparable in terms of normalization? Thank you very much@dare_devil, Ok I tried but log scaled RPKM are also negative in some cases and the

`vst`

function doesn't work on negative values. How can I handle with this?You should have a matrix of RPKM values greater than or equal to 1. In order to achieve this you can add 1 to entire data frame then convert to log scale to avoid negative values.

Thank you.

Now the problem is that I downloaded some data from GEO (Tumoral Breast vs Normal Breast samples), in particular this is the code: GSE68085, I suppose that data is already log2 normalized and some negative values are in it. I want to use this data as a validation dataset (I'm using an svm classifier): I've downloaded the series matrix and I used the batch ID information for the batch correction with

`comBat`

function. Should I do the inverse exponential function and then apply`vst`

?Thank you very much again.

In this case, I would suggest

`nneg`

in`NMF`

packageThis will convert all negative values to

`0`

You can go through this link for other methods

You can convert the log2 scaled data to their corresponding RPKM values using inverse function. I looked at your data

`GSE68085`

. But, I don't think they are log transformed valuesThank you! Ok, but these data is described as "normalized" I can't understand what type of normalization they did, do they just refer to RPKM? And if so, why do we have negative values? I red the series matrix and I could not find any other useful info. Thanks again.

You can download the data and redo the analysis. You can find its raw data here for download