I have RNA seq matrix in the form of FPKM (Raw counts are not available). I want to add this data set to other datasets that I have for a machine learning model building downstream. Can I follow the following approach or not: First, Filter the matrix by keeping only FPKM >= 1 in at least 10 samples
RNA_FPKM <- RNA_FPKM[apply(RNA_FPKM[,-1], 1, function(X) length(X[X >= 1]) > 10 ) , ]
Then taking the log2 of the filtered FPKM matrix and adding 0.1
new_expression <- log2(RNA_FPKM + 0.1)
to filter out the lowly expressed genes. My question is this a valid approach? I don't have the raw counts so that is the best I can do. Forgive me if this is totally wrong or idiotic but I am totally new to this field so your help will be much appreciated.