Entering edit mode
                    5.3 years ago
        lenC_biotecLover
        
    
        ▴
    
    90
    I have a matrix with different miRNA RPKM values downloaded from TCGA, relatively to different TCGA projects (BRCA, LAML, LUAD ecc.) columns: TCGA-barcodes, rows: miRNa identifier.
In order to perform a machine learning analysis how can I normalize all this data between the patients in my matrix? I searched all around the web but I couldn't find any answer.
I'm really a novice in bioinformatics and computational biology, and any advice is strongly appreciated. Thank you very much.
I know, but I meant between the patients, considering that I've data from different projects
You can convert rpkm to
log scaleand performvstThank you, after this, when I have the vst normalized data (using the DEseq2 package, isn't it?), it is the same of having counts data transformed using the same
vstfunction?. For instance, if I have a RPKM dataset converted using firstlog scalethenvstand also a counts dataset normalized with thevstfunction, are they comparable in terms of normalization? Thank you very much@dare_devil, Ok I tried but log scaled RPKM are also negative in some cases and the
vstfunction doesn't work on negative values. How can I handle with this?You should have a matrix of RPKM values greater than or equal to 1. In order to achieve this you can add 1 to entire data frame then convert to log scale to avoid negative values.
Thank you.
Now the problem is that I downloaded some data from GEO (Tumoral Breast vs Normal Breast samples), in particular this is the code: GSE68085, I suppose that data is already log2 normalized and some negative values are in it. I want to use this data as a validation dataset (I'm using an svm classifier): I've downloaded the series matrix and I used the batch ID information for the batch correction with
comBatfunction. Should I do the inverse exponential function and then applyvst?Thank you very much again.
In this case, I would suggest
nneginNMFpackageThis will convert all negative values to
0You can go through this link for other methods
You can convert the log2 scaled data to their corresponding RPKM values using inverse function. I looked at your data
GSE68085. But, I don't think they are log transformed valuesThank you! Ok, but these data is described as "normalized" I can't understand what type of normalization they did, do they just refer to RPKM? And if so, why do we have negative values? I red the series matrix and I could not find any other useful info. Thanks again.
You can download the data and redo the analysis. You can find its raw data here for download