6 months ago by
@harnish is right that Tximport should be able to do transcript to gene level conversion, although I'm not sure if StringTie or express are sources it handle importing form automatically. I've not heard of Tximport calculating TPMs from counts, but I could be wrong. Still, these things arn't hard to calculate yourself.
TPM from counts
To calculate TPM first calculate the RPKM/FPKM for each gene. Actually, you only need F/RPK as the per million will come out in the wash. It also doesn't matter if you use pairs (F) or reads (R). That is counts/pairs mapping to a gene divided by the total exonic length of the gene.
Some more sophisticated algorithms will use an effective length rather than real length.
To convert this to TPM divide the FPKM of each gene by the the sum of FPKM for all genes and multiply by 1 million.
If I have a dataframe
df with three columns
length then TPM is calculated:
df$RPK <- df$counts/df$length
df$TPM <- df$RPK*1000000/sum(df$RPK)
Gene TPM from transcript TPM
As TPM is transcripts per million, the gene TPM is simply the some of the transcript TPMs for all transcripts belonging to that gene.
If our dataframe
TPM, then we calculate gene TPM using
gene_tpm <- group_by(transcript_tpm, gene_id) %>% summarize(TPM = sum(TPM))