I'm conducting a meta-analysis over several datasets. I want to combine those datasets and run some machine learning algorithms to predict a target response. Some of those datasets are raw counts, which I can easily convert to TPM with the following code:
rpkm <- apply(X = subset(counts_data),
MARGIN = 2,
FUN = function(x) {
10^9 * x / genelength / sum(as.numeric(x))
})
TPM <- apply(rpkm, 2, function(x) x / sum(as.numeric(x)) * 10^6) %>% as.data.frame()
And some datasets provide RPKM data, which I can also convert to TPM like this:
TPM= apply(RPKM, 2, function(x) x / sum(as.numeric(x)) * 10^6) %>% as.data.frame()
Some datasets, however, only provide FPKM data. This is problamatic, I need all datasets to be TPM normalized, and I'm not familiar with converting FPKM to TPM.
Is it possible to convert FPKM reads to TPM? I found this approach: TPM = FPKM * X where X = 1e6/[sum of all FPKM of a sample].
I'm not sure if I'm allowed to do this, I don't want to use it and get misleading results. What to do guys think? if I can use it, what is the code in R?
Note: the datasets that provide RPKM or FPKM have no raw data or counts data.
I apologize for my previous comment - the code looked really similar to the one generated by ChatGPT and your history of using ChatGPT triggered a suspicion. I'll delete my other comment.
is really the function working as expected ?
The output for the gene_expression dataset is :
If we had sequenced only two samples, the output is different :
Here, based on your function, a modification that yields same result.
Function called on the gene_expression dataset :
Function called on a subset of the gene_expression dataset :