Best transcriptome assembler
2
1
Entering edit mode
5 months ago

Hello. I am looking for benchmark papers assessing the best reference guided transcriptome assembler using RNA-Seq data (which includes an assessment of StringTie). Any guidance would be greatly appreciated.

transcriptome assembly • 536 views
ADD COMMENT
0
Entering edit mode

If you're looking for a comparison of de novo transcriptome assemblers, take a look at this paper: https://doi.org/10.1093/gigascience/giz039.

ADD REPLY
1
Entering edit mode
5 months ago
Shred ▴ 870

As always, no assembler could be universally considered better than another. I'd suggest Stringtie and Scallop2 (https://www.nature.com/articles/s43588-022-00216-1): the first for specificity (with a bit of parameters tuning) and the latter for sensitivity. Keep in mind that this comes from my personal experience with both, as respective authors claim for better performances and no recent extensive benchmark exists in literature.

ADD COMMENT
0
Entering edit mode

Thanks for the Scallop2 paper -- I hadn't seen it before; definitely worth checking out.

ADD REPLY
0
Entering edit mode
ADD COMMENT
0
Entering edit mode

An assembler built in Python? Looks like a PoC rather than a wide usable tool.

ADD REPLY
2
Entering edit mode

Trinity is written in Perl; some of the earlier assembly software from the Broad were written in Java.

From the paper: "RefShannon is overall faster than guided Trinity, Cufflinks, and Ryuto (for large dataset and more processes). RefShannon consumes more memory compared to other assemblers (except guided Trinity which essentially conducts de novo assembly). This could be because RefShannon is written in Python and memory sharing is less efficient especially for multiprocessing. Currently, a typical lab server with at least 20 CPU cores and over 200GB memory would be sufficient to run RefShannon on large real datasets. One of our future direction is to further improve its computational efficiency."

ADD REPLY
0
Entering edit mode

Both Perl and Java have better memory managment than Python (considering well writted code). Quotes from the paper suggest that required computational resources are actually a strong limitation for the tool.

a typical lab server with at least 20 CPU cores and over 200GB memory would be sufficient to run RefShannon on large real datasets

This may be true for US institutions. Besides that, the implemented algorithm seems interesting: L. Patcher between authors is always a warranty.

ADD REPLY

Login before adding your answer.

Traffic: 1169 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6