Question: How to convert transcript level TPM to gene level TPM ?
gravatar for k.kathirvel93
15 months ago by
k.kathirvel93210 wrote:

Hi EveryOne,

I am using various quantification tools for RNA-seq analysis. Now my query is : HTseq-count and featurecounts are producing gene level counts of gene abundance, STRINGtie and Express are producing in transcript level TPM abundance. Now i want to compare these two outputs. For this i am thinking to convert two things first:

  1. gene counts(HTSeq and FeatureCounts) to gene level TPM and
  2. transcripts TPM(STRINGtie) to gene level TPM .

How can i get succeed in this ? Thanks.

sequencing rna-seq next-gen gene • 2.4k views
ADD COMMENTlink modified 15 months ago by vj410 • written 15 months ago by k.kathirvel93210

Can you please tell, how did you manage to get transcript level TPM using stringtie?

Because, I need transcript level TPM but stringtie output I get has TPM at gene level only.

ADD REPLYlink written 5 months ago by kousi310
gravatar for harish
15 months ago by
harish230 wrote:

For any such conversion, i.e summing upto gene level from transcript level, you can always use Tximport.

However since you have TPMs, you will definitely need to go to counts level and then rescale it back to gene-level.

In any case please do let us know how different they are!

ADD COMMENTlink modified 15 months ago • written 15 months ago by harish230
gravatar for vj
15 months ago by
vj410 wrote:

If I am not wrong the TPM (Transcripts Per Million) is normalised transcripts (mRNA molecules) for a gene or a isoform (see Read Mapping and abundance estimation section). So in theory your option 2 is not necessary, if you can get gene-level TPMs. StringTie should be giving you gene-level abundances in TPMs (using -A flag) so you can directly compare them to the TPMs from HTSeq-counts (using @i.sudbery equation).

ADD COMMENTlink written 15 months ago by vj410
gravatar for i.sudbery
15 months ago by
Sheffield, UK
i.sudbery6.3k wrote:

@harnish is right that Tximport should be able to do transcript to gene level conversion, although I'm not sure if StringTie or express are sources it handle importing form automatically. I've not heard of Tximport calculating TPMs from counts, but I could be wrong. Still, these things arn't hard to calculate yourself.

TPM from counts

To calculate TPM first calculate the RPKM/FPKM for each gene. Actually, you only need F/RPK as the per million will come out in the wash. It also doesn't matter if you use pairs (F) or reads (R). That is counts/pairs mapping to a gene divided by the total exonic length of the gene.

Some more sophisticated algorithms will use an effective length rather than real length.

To convert this to TPM divide the FPKM of each gene by the the sum of FPKM for all genes and multiply by 1 million.

If I have a dataframe df with three columns gene_id, counts and length then TPM is calculated:

df$RPK <- df$counts/df$length
df$TPM <- df$RPK*1000000/sum(df$RPK)

Gene TPM from transcript TPM

As TPM is transcripts per million, the gene TPM is simply the some of the transcript TPMs for all transcripts belonging to that gene.

If our dataframe transcript_tpm has gene_id, transcript_id and TPM, then we calculate gene TPM using dplyr thus:

gene_tpm <- group_by(transcript_tpm, gene_id) %>% summarize(TPM = sum(TPM))
ADD COMMENTlink written 15 months ago by i.sudbery6.3k

Shouldn't you only group_by gene_id? All the tpm rows with the same gene_id then will be summed.

ADD REPLYlink written 13 months ago by jperez93150

transcript_tpm is the name of the dataframe, not a column in it.

ADD REPLYlink written 13 months ago by i.sudbery6.3k
gravatar for Prakash
15 months ago by
Prakash1.7k wrote:

To convert gene count to TPM , you can use this R script and to get gene level TPM from transcript.

ADD COMMENTlink written 15 months ago by Prakash1.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1872 users visited in the last hour