Question: How to convert transcript level TPM to gene level TPM ?
gravatar for k.kathirvel93
23 months ago by
k.kathirvel93250 wrote:

Hi EveryOne,

I am using various quantification tools for RNA-seq analysis. Now my query is : HTseq-count and featurecounts are producing gene level counts of gene abundance, STRINGtie and Express are producing in transcript level TPM abundance. Now i want to compare these two outputs. For this i am thinking to convert two things first:

  1. gene counts(HTSeq and FeatureCounts) to gene level TPM and
  2. transcripts TPM(STRINGtie) to gene level TPM .

How can i get succeed in this ? Thanks.

sequencing rna-seq next-gen gene • 3.6k views
ADD COMMENTlink modified 23 months ago by vj430 • written 23 months ago by k.kathirvel93250

Can you please tell, how did you manage to get transcript level TPM using stringtie?

Because, I need transcript level TPM but stringtie output I get has TPM at gene level only.

ADD REPLYlink written 13 months ago by kousi3110
gravatar for harish
23 months ago by
harish290 wrote:

For any such conversion, i.e summing upto gene level from transcript level, you can always use Tximport.

However since you have TPMs, you will definitely need to go to counts level and then rescale it back to gene-level.

In any case please do let us know how different they are!

ADD COMMENTlink modified 23 months ago • written 23 months ago by harish290
gravatar for i.sudbery
23 months ago by
Sheffield, UK
i.sudbery8.1k wrote:

@harnish is right that Tximport should be able to do transcript to gene level conversion, although I'm not sure if StringTie or express are sources it handle importing form automatically. I've not heard of Tximport calculating TPMs from counts, but I could be wrong. Still, these things arn't hard to calculate yourself.

TPM from counts

To calculate TPM first calculate the RPKM/FPKM for each gene. Actually, you only need F/RPK as the per million will come out in the wash. It also doesn't matter if you use pairs (F) or reads (R). That is counts/pairs mapping to a gene divided by the total exonic length of the gene.

Some more sophisticated algorithms will use an effective length rather than real length.

To convert this to TPM divide the FPKM of each gene by the the sum of FPKM for all genes and multiply by 1 million.

If I have a dataframe df with three columns gene_id, counts and length then TPM is calculated:

df$RPK <- df$counts/df$length
df$TPM <- df$RPK*1000000/sum(df$RPK)

Gene TPM from transcript TPM

As TPM is transcripts per million, the gene TPM is simply the some of the transcript TPMs for all transcripts belonging to that gene.

If our dataframe transcript_tpm has gene_id, transcript_id and TPM, then we calculate gene TPM using dplyr thus:

gene_tpm <- group_by(transcript_tpm, gene_id) %>% summarize(TPM = sum(TPM))
ADD COMMENTlink written 23 months ago by i.sudbery8.1k

Shouldn't you only group_by gene_id? All the tpm rows with the same gene_id then will be summed.

ADD REPLYlink written 20 months ago by jperez93150

transcript_tpm is the name of the dataframe, not a column in it.

ADD REPLYlink written 20 months ago by i.sudbery8.1k

in my data I have transcript id, gene id and 6 columns samples (3 control and 3 experiment), how would u write the code for (summarize)? when I tried it, I ended up with only one column of total TPM. thanx

ADD REPLYlink written 4 months ago by fabucklain10

Yes. TPM=sum(tpm) is calculating the per group sums of the TPM column. If you want the sums of more columns, you will need to sum them as well.

ADD REPLYlink written 4 months ago by i.sudbery8.1k
gravatar for vj
23 months ago by
vj430 wrote:

If I am not wrong the TPM (Transcripts Per Million) is normalised transcripts (mRNA molecules) for a gene or a isoform (see Read Mapping and abundance estimation section). So in theory your option 2 is not necessary, if you can get gene-level TPMs. StringTie should be giving you gene-level abundances in TPMs (using -A flag) so you can directly compare them to the TPMs from HTSeq-counts (using @i.sudbery equation).

ADD COMMENTlink written 23 months ago by vj430
gravatar for Prakash
23 months ago by
Prakash1.9k wrote:

To convert gene count to TPM , you can use this R script and to get gene level TPM from transcript.

ADD COMMENTlink written 23 months ago by Prakash1.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 598 users visited in the last hour