Question

Is Stringtie a suitable tool for both gene and transcript quantification?

0

Entering edit mode

3.9 years ago

tianshenbio ▴ 170

I am thinking about generating read count matrix at both gene-level and transcript (isoform)-level.

According to a previous post:

Why run FeatureCounts after Stringtie? (Galaxy recommends!)

How to get read counts on transcript level using featurecounts?

It seems that I can use FeatureCounts for gene quantification and Stringtie for transcript/isoform quantification, am I right?

Since transcripts are heavily overlapping, featurecounts cannot properly sort out reads mapping to the same exon, thus is not suitable to count transcripts/isoforms. Then how this can be overcome in Stringtie? Are common reads properly sorted using stringtie?

Many people suggested an alignment-free tool, Salmon, for transcript quantification. Since I am interested to find both DE genes and DE transcripts/isoforms in my DE analysis, I assume Stringtie would be a more handy option since I can get both gene and transcript counts in one run.

Therefore, my question would be, is gene/transcript quantification reliable using Stringtie? How does it distribute common reads shared by multiple isoforms, which is the major problem to quantify isoforms.

I have read the original papers and related posts here in biostars but still not sure...appreciate it if someone can clarify this for me.

stringtie RNA-Seq sequencing featurecounts gene • 3.6k views

ADD COMMENT • link updated 22 months ago by virajbdeshpande • 0 • written 3.9 years ago by tianshenbio ▴ 170

4

Entering edit mode

3.9 years ago

ATpoint 82k

You can easily aggregate the salmon transcript level abundance estimates to the gene level with the tximport package from Bioconductor. I would definitely go with salmon. From what I understand stringtie is mainly used to assemble reads into a transcriptome and I probably would only use it for that.

ADD COMMENT • link 3.9 years ago by ATpoint 82k

0

Entering edit mode

Thank you for your reply. Yeh, since I will not perform transcriptome assembly, Salmon might be a better choice.

ADD REPLY • link 3.9 years ago by tianshenbio ▴ 170

score 6 · Accepted Answer · 2020-06-02

6

Entering edit mode

3.9 years ago

i.sudbery 19k

Every benchmark I have seen (as well as my own experence) shows that StringTie is less accurate that Salmon/kalisto/RSEM. I don't actaully know what the model at the heart of StringTie's quantification is. Salmon/kalisto/RSEM all use some variation of EM to distribute reads between transcripts.

As @ATpoint points out, its fairly easy to calculate gene expression from transcript expression.

ADD COMMENT • link 3.9 years ago by i.sudbery 19k

0

Entering edit mode

Thank you for your response. Will give it a try!

ADD REPLY • link 3.9 years ago by tianshenbio ▴ 170

0

Entering edit mode

Thanks for this comment. Could you point to some example benchmarks you are referring to?

ADD REPLY • link 22 months ago by virajbdeshpande • 0