Question: How to convert transcript level TPM to gene level TPM ?
2
gravatar for k.kathirvel93
2.2 years ago by
k.kathirvel93260
India
k.kathirvel93260 wrote:

Hi EveryOne,

I am using various quantification tools for RNA-seq analysis. Now my query is : HTseq-count and featurecounts are producing gene level counts of gene abundance, STRINGtie and Express are producing in transcript level TPM abundance. Now i want to compare these two outputs. For this i am thinking to convert two things first:

  1. gene counts(HTSeq and FeatureCounts) to gene level TPM and
  2. transcripts TPM(STRINGtie) to gene level TPM .

How can i get succeed in this ? Thanks.

sequencing rna-seq next-gen gene • 4.4k views
ADD COMMENTlink modified 2.2 years ago by vj450 • written 2.2 years ago by k.kathirvel93260

Can you please tell, how did you manage to get transcript level TPM using stringtie?

Because, I need transcript level TPM but stringtie output I get has TPM at gene level only.

ADD REPLYlink written 16 months ago by kousi3130
5
gravatar for harish
2.2 years ago by
harish320
harish320 wrote:

For any such conversion, i.e summing upto gene level from transcript level, you can always use Tximport.

However since you have TPMs, you will definitely need to go to counts level and then rescale it back to gene-level.

In any case please do let us know how different they are!

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by harish320
4
gravatar for i.sudbery
2.2 years ago by
i.sudbery9.4k
Sheffield, UK
i.sudbery9.4k wrote:

@harnish is right that Tximport should be able to do transcript to gene level conversion, although I'm not sure if StringTie or express are sources it handle importing form automatically. I've not heard of Tximport calculating TPMs from counts, but I could be wrong. Still, these things arn't hard to calculate yourself.

TPM from counts

To calculate TPM first calculate the RPKM/FPKM for each gene. Actually, you only need F/RPK as the per million will come out in the wash. It also doesn't matter if you use pairs (F) or reads (R). That is counts/pairs mapping to a gene divided by the total exonic length of the gene.

Some more sophisticated algorithms will use an effective length rather than real length.

To convert this to TPM divide the FPKM of each gene by the the sum of FPKM for all genes and multiply by 1 million.

If I have a dataframe df with three columns gene_id, counts and length then TPM is calculated:

df$RPK <- df$counts/df$length
df$TPM <- df$RPK*1000000/sum(df$RPK)

Gene TPM from transcript TPM

As TPM is transcripts per million, the gene TPM is simply the some of the transcript TPMs for all transcripts belonging to that gene.

If our dataframe transcript_tpm has gene_id, transcript_id and TPM, then we calculate gene TPM using dplyr thus:

gene_tpm <- group_by(transcript_tpm, gene_id) %>% summarize(TPM = sum(TPM))
ADD COMMENTlink written 2.2 years ago by i.sudbery9.4k

Shouldn't you only group_by gene_id? All the tpm rows with the same gene_id then will be summed.

ADD REPLYlink written 2.0 years ago by jperez93150

transcript_tpm is the name of the dataframe, not a column in it.

ADD REPLYlink written 2.0 years ago by i.sudbery9.4k

in my data I have transcript id, gene id and 6 columns samples (3 control and 3 experiment), how would u write the code for (summarize)? when I tried it, I ended up with only one column of total TPM. thanx

ADD REPLYlink written 8 months ago by fabucklain10

Yes. TPM=sum(tpm) is calculating the per group sums of the TPM column. If you want the sums of more columns, you will need to sum them as well.

ADD REPLYlink written 8 months ago by i.sudbery9.4k
3
gravatar for vj
2.2 years ago by
vj450
UK
vj450 wrote:

If I am not wrong the TPM (Transcripts Per Million) is normalised transcripts (mRNA molecules) for a gene or a isoform (see Read Mapping and abundance estimation section). So in theory your option 2 is not necessary, if you can get gene-level TPMs. StringTie should be giving you gene-level abundances in TPMs (using -A flag) so you can directly compare them to the TPMs from HTSeq-counts (using @i.sudbery equation).

ADD COMMENTlink written 2.2 years ago by vj450
0
gravatar for Prakash
2.2 years ago by
Prakash2.0k
India
Prakash2.0k wrote:

To convert gene count to TPM , you can use this R script and to get gene level TPM from transcript.

ADD COMMENTlink written 2.2 years ago by Prakash2.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 993 users visited in the last hour