Question: How to convert transcript level TPM to gene level TPM ?
gravatar for k.kathirvel93
6 months ago by
k.kathirvel93180 wrote:

Hi EveryOne,

I am using various quantification tools for RNA-seq analysis. Now my query is : HTseq-count and featurecounts are producing gene level counts of gene abundance, STRINGtie and Express are producing in transcript level TPM abundance. Now i want to compare these two outputs. For this i am thinking to convert two things first:

  1. gene counts(HTSeq and FeatureCounts) to gene level TPM and
  2. transcripts TPM(STRINGtie) to gene level TPM .

How can i get succeed in this ? Thanks.

sequencing rna-seq next-gen gene • 1.0k views
ADD COMMENTlink modified 6 months ago by vj390 • written 6 months ago by k.kathirvel93180
gravatar for harish
6 months ago by
harish140 wrote:

For any such conversion, i.e summing upto gene level from transcript level, you can always use Tximport.

However since you have TPMs, you will definitely need to go to counts level and then rescale it back to gene-level.

In any case please do let us know how different they are!

ADD COMMENTlink modified 6 months ago • written 6 months ago by harish140
gravatar for vj
6 months ago by
vj390 wrote:

If I am not wrong the TPM (Transcripts Per Million) is normalised transcripts (mRNA molecules) for a gene or a isoform (see Read Mapping and abundance estimation section). So in theory your option 2 is not necessary, if you can get gene-level TPMs. StringTie should be giving you gene-level abundances in TPMs (using -A flag) so you can directly compare them to the TPMs from HTSeq-counts (using @i.sudbery equation).

ADD COMMENTlink written 6 months ago by vj390
gravatar for i.sudbery
6 months ago by
Sheffield, UK
i.sudbery3.8k wrote:

@harnish is right that Tximport should be able to do transcript to gene level conversion, although I'm not sure if StringTie or express are sources it handle importing form automatically. I've not heard of Tximport calculating TPMs from counts, but I could be wrong. Still, these things arn't hard to calculate yourself.

TPM from counts

To calculate TPM first calculate the RPKM/FPKM for each gene. Actually, you only need F/RPK as the per million will come out in the wash. It also doesn't matter if you use pairs (F) or reads (R). That is counts/pairs mapping to a gene divided by the total exonic length of the gene.

Some more sophisticated algorithms will use an effective length rather than real length.

To convert this to TPM divide the FPKM of each gene by the the sum of FPKM for all genes and multiply by 1 million.

If I have a dataframe df with three columns gene_id, counts and length then TPM is calculated:

df$RPK <- df$counts/df$length
df$TPM <- df$RPK*1000000/sum(df$RPK)

Gene TPM from transcript TPM

As TPM is transcripts per million, the gene TPM is simply the some of the transcript TPMs for all transcripts belonging to that gene.

If our dataframe transcript_tpm has gene_id, transcript_id and TPM, then we calculate gene TPM using dplyr thus:

gene_tpm <- group_by(transcript_tpm, gene_id) %>% summarize(TPM = sum(TPM))
ADD COMMENTlink written 6 months ago by i.sudbery3.8k

Shouldn't you only group_by gene_id? All the tpm rows with the same gene_id then will be summed.

ADD REPLYlink written 4 months ago by jperez93150

transcript_tpm is the name of the dataframe, not a column in it.

ADD REPLYlink written 3 months ago by i.sudbery3.8k
gravatar for Prakash
6 months ago by
Prakash730 wrote:

To convert gene count to TPM , you can use this R script and to get gene level TPM from transcript.

ADD COMMENTlink written 6 months ago by Prakash730
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1147 users visited in the last hour