I have aligned reads and gene counts from STAR. How should I calculate TPM from these?
1
0
Entering edit mode
3.5 years ago
akibio • 0

Hi all,

I have aligned reads from STAR with --quantMode TranscriptomeSAM GeneCounts, which then outputs the files Aligned.out.sam Aligned.toTranscriptome.out.bam and ReadsPerGene.out.tab. I'd ideally like a piece of reputable software to calculate the TPMs from these files (plus any necessary annotations file). Can anyone recommend me the correct tool here and briefly describe what I need to get it working?

A further naive question, but if I want to do gene expression analysis i.e. differential expression or some other modeling of the normalized counts, do I ever even need the Aligned.out.sam file? I don't know what this is used for.

EDIT: I am continually coming across RSEM as a tool that takes as input Aligned.toTranscriptome.out.bam and outputs normalized counts. I will look into this.

RNA-Seq • 3.2k views
ADD COMMENT
2
Entering edit mode
3.5 years ago
h.mon 35k

You can use Salmon to estimate transcript counts and TPMs using the Aligned.toTranscriptome.out.bam. The result would be similar to RSEM, but it would run much faster. If you estimate TPMs with Salmon, be consistent and also use the counts estimated by Salmon, instead of the STAR counts. You could also get the (almost) same counts estimates and TPMs using Salmon quasi-mapping to the transcriptome.

The Aligned.out.sam could be used with featureCounts or HTSeq to obtain an output similar to the ReadsPerGene.out.tab, which means it is redundant and not needed in your context.

ADD COMMENT
0
Entering edit mode

Thanks! Since I don't have a fasta file of the transcriptome of my organism, I don't think I can use Salmon. Is that correct?

ADD REPLY
2
Entering edit mode

If you have the genome and a gtf annotation, you can extract the transcriptome with a number of different tools, such as gffread. RSEM also need a transcriptome fasta, so you would have the same problem - although RSEM also provides an script to extract the transcriptome fasta from a genome fasta and a gtf annotation.

Or see this answer for a TPM formula (along some comments on why TPM calculated with gene counts and lengths is not truly a TPM), and this post for methods of calculating gene length given an annotation.

ADD REPLY
0
Entering edit mode

This is very helpful. So by using gffread, I can obtain a transcriptome fasta and using this I can essentially align my reads to the transcriptome using Kallisto or Salmon? And both can estimate raw counts and normalized counts? This would reduce computation time by a lot!

ADD REPLY

Login before adding your answer.

Traffic: 1521 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6