In principle, the TPM formula can be reverted, see the timeless post
In practice, some tools may apply additional corrections and scaling to the data.
As Brian Bushnell mentions, undoing these kinds of transformations without sufficient background information can be sketchy.
Form the post linked above the formulas are
countToTpm <- function(counts, effLen)
{
rate <- log(counts) - log(effLen)
denom <- log(sum(exp(rate)))
exp(rate - denom + log(1e6))
}
countToFpkm <- function(counts, effLen)
{
N <- sum(counts)
exp( log(counts) + log(1e9) - log(effLen) - log(N) )
}
fpkmToTpm <- function(fpkm)
{
exp(log(fpkm) - log(sum(fpkm)) + log(1e6))
}
countToEffCounts <- function(counts, len, effLen)
{
counts * (len / effLen)
}
################################################################################
# An example
################################################################################
cnts <- c(4250, 3300, 200, 1750, 50, 0)
lens <- c(900, 1020, 2000, 770, 3000, 1777)
countDf <- data.frame(count = cnts, length = lens)
# assume a mean(FLD) = 203.7
countDf$effLength <- countDf$length - 203.7 + 1
countDf$tpm <- with(countDf, countToTpm(count, effLength))
countDf$fpkm <- with(countDf, countToFpkm(count, effLength))
with(countDf, all.equal(tpm, fpkmToTpm(fpkm)))
countDf$effCounts <- with(countDf, countToEffCounts(count, length, effLength))
I'm posting this as a comment instead of an answer specifically because it's just what I would do and I don't know if it's the best approach in your case. But, whenever I want to generate data that I can trust, I start from raw reads. It sounds to me like you are presuming a lot of things. The people who generated the data probably had their own goals, biases, and methods; do you really want those to influence your results? If you fully understand what they did, and what you are doing, then it's trivial to redo it yourself unless it's very computationally-intensive, which you haven't mentioned.
Thanks Brian for the suggestion. However, I did the whole process, from bulk counts generation, to data transformation and scrna deconvolution. The only assumption that I am not 100% sure about is that CIBERSORTX will generate tpm deconvolved data if starting from tpm data. As far as I know, cibersortx performs a linear transformation of normalized data, so in principle, it is correct to assume that celltype-specific gep are in tpm format.