Question: I have aligned reads and gene counts from STAR. How should I calculate TPM from these?
gravatar for akibio
5 weeks ago by
akibio0 wrote:

Hi all,

I have aligned reads from STAR with --quantMode TranscriptomeSAM GeneCounts, which then outputs the files Aligned.out.sam Aligned.toTranscriptome.out.bam and I'd ideally like a piece of reputable software to calculate the TPMs from these files (plus any necessary annotations file). Can anyone recommend me the correct tool here and briefly describe what I need to get it working?

A further naive question, but if I want to do gene expression analysis i.e. differential expression or some other modeling of the normalized counts, do I ever even need the Aligned.out.sam file? I don't know what this is used for.

EDIT: I am continually coming across RSEM as a tool that takes as input Aligned.toTranscriptome.out.bam and outputs normalized counts. I will look into this.

rna-seq • 201 views
ADD COMMENTlink modified 4 weeks ago by ATpoint41k • written 5 weeks ago by akibio0
gravatar for h.mon
4 weeks ago by
h.mon31k wrote:

You can use Salmon to estimate transcript counts and TPMs using the Aligned.toTranscriptome.out.bam. The result would be similar to RSEM, but it would run much faster. If you estimate TPMs with Salmon, be consistent and also use the counts estimated by Salmon, instead of the STAR counts. You could also get the (almost) same counts estimates and TPMs using Salmon quasi-mapping to the transcriptome.

The Aligned.out.sam could be used with featureCounts or HTSeq to obtain an output similar to the, which means it is redundant and not needed in your context.

ADD COMMENTlink written 4 weeks ago by h.mon31k

Thanks! Since I don't have a fasta file of the transcriptome of my organism, I don't think I can use Salmon. Is that correct?

ADD REPLYlink written 4 weeks ago by akibio0

If you have the genome and a gtf annotation, you can extract the transcriptome with a number of different tools, such as gffread. RSEM also need a transcriptome fasta, so you would have the same problem - although RSEM also provides an script to extract the transcriptome fasta from a genome fasta and a gtf annotation.

Or see this answer for a TPM formula (along some comments on why TPM calculated with gene counts and lengths is not truly a TPM), and this post for methods of calculating gene length given an annotation.

ADD REPLYlink written 4 weeks ago by h.mon31k

This is very helpful. So by using gffread, I can obtain a transcriptome fasta and using this I can essentially align my reads to the transcriptome using Kallisto or Salmon? And both can estimate raw counts and normalized counts? This would reduce computation time by a lot!

ADD REPLYlink written 4 weeks ago by akibio0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1359 users visited in the last hour