Question: TPM from StringTie
gravatar for gozrom
2.6 years ago by
gozrom80 wrote:

I have extracted all the TPM values from gtf files generated by StringTie for all replicates, however Those TPM values are per transcript and not per gene.

Now I have one huge csv file with 12 replicates and their corresponding TPM values and I want to make the TPM values per gene to use it in a subsequent visualization.

File looks like this:

 X1    TPM transcript_id ref_gene_name  TPM.1 transcript_id.1 ref_gene_name.1  TPM.2 transcript_id.2
   <int>  <dbl> <chr>         <chr>          <dbl> <chr>           <chr>            <dbl> <chr>          
 1     1  2.60  MSTRG.1.1     <NA>           3.78  MSTRG.1.1       <NA>             4.22  MSTRG.1.1      
 2     2 NA     MSTRG.1.1     <NA>          NA     MSTRG.1.1       <NA>            NA     MSTRG.1.1      
 3     3  2.01  MSTRG.2.1     <NA>           1.17  MSTRG.2.1       <NA>             1.48  MSTRG.2.1      
 4     4 NA     MSTRG.2.1     <NA>          NA     MSTRG.2.1       <NA>            NA     MSTRG.2.1      
 5     5  0.402 ENSMUST00000~ Gm10568        0.316 ENSMUST0000019~ Gm10568          0.183 ENSMUST0000019~
 6     6 NA     ENSMUST00000~ Gm10568       NA     ENSMUST0000019~ Gm10568         NA     ENSMUST0000019~
 7     7  0.253 ENSMUST00000~ Gm7357         0.    ENSMUST0000020~ Rp1              2.66  ENSMUST0000018~
 8     8 NA     ENSMUST00000~ Gm7357        NA     ENSMUST0000020~ Rp1             NA     ENSMUST0000018~
 9     9 NA     ENSMUST00000~ Gm7357        NA     ENSMUST0000020~ Rp1              0.    ENSMUST0000019~
10    10  0.182 ENSMUST00000~ Gm6119        NA     ENSMUST0000020~ Rp1             NA     ENSMUST0000019~
 ... with 1,135,291 more rows,

Not sure, how to do that, if it's possible at all...

I guess it can be a for loop that runs on each ref_gene_name and sums up all the TPM from the TPM column before but I need it to run on all ref_gene_columns and create appropriate columns in a new data frame, and then export the new data frame to csv file. The code it's just to illustrate the idea, it doesn't mean it is the correct code....

df <-
for i  to i=file$ref_gene_name$end
if ref_gene_name$i == ref_gene_name$(i+1)
df$gene$i <- file$ref_gene_name$i
df$condition1.TPM <- file$TPM$i + file$TPM$(i+1)
if df$gene$i == file$TPM$(i+1)
df$condition1.TPM <- df$condition1.TPM + file$TPM$(i+1)
df$gene$i <- file$ref_gene_name$i

Any help is appreciated, thank you.

rna-seq • 2.7k views
ADD COMMENTlink modified 17 months ago by ATpoint40k • written 2.6 years ago by gozrom80

For me it is hard to fully understand the data format and what you tried, but I can give a generic advice. Give a look at the R function aggregate. If you have a simple structure with all the transcripts and genes and tmp in a single dataframe and want to sum TPM of the same gene, just try something like this:

ADD REPLYlink written 2.5 years ago by Fabio Marroni2.6k

Thanks, that seems simpler than what I wrote,

I tried aggregate but got an error:

Error in : 'length(genes_list$ref_gene_name)' is not a function, character or symbol If I run length(genes_list$ref_gene_name) as is it gives me the length of the specific column.

but when I do it through aggregate

TEST <- aggregate(gene_list$TPM,by=list(gene_list$ref_gene_name), FUN = length(gene_list$ref_gene_name))

I get an error.

ADD REPLYlink written 2.5 years ago by gozrom80

Figured the error

when I substitute the FUN argument to any of a functional definition it works, but it only aggregate gene names without showing TPM values...

I need both I need the sum of all the TPM values from all the transcripts specific to each gene, and also the gene list

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by gozrom80

Can you please tell me how you filltered out TPM values from stringtie output?

ADD REPLYlink modified 18 months ago • written 18 months ago by waqaskhokhar99990

Hi, I would also be interested in the same (but actually at the transcript level). Is there a convenient way to extract all the TPM values for all transcripts for all samples to feed in to Ballgown DE analysis? Thank you very much.

ADD REPLYlink written 18 months ago by rnaseql0

I think you can use -A flag when you do the counting

ADD REPLYlink written 17 months ago by yichangyuycy10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 725 users visited in the last hour