Question

Best transcriptome assembler

1

Entering edit mode

2.0 years ago

anonymous5982 ▴ 10

Hello. I am looking for benchmark papers assessing the best reference guided transcriptome assembler using RNA-Seq data (which includes an assessment of StringTie). Any guidance would be greatly appreciated.

transcriptome assembly • 1.3k views

ADD COMMENT • link updated 2.0 years ago by Dunois ★ 2.5k • written 2.0 years ago by anonymous5982 ▴ 10

0

Entering edit mode

If you're looking for a comparison of de novo transcriptome assemblers, take a look at this paper: https://doi.org/10.1093/gigascience/giz039.

ADD REPLY • link 2.0 years ago by Dunois ★ 2.5k

score 1 · Answer 1 · 2022-04-07

1

Entering edit mode

2.0 years ago

Shred ★ 1.4k

As always, no assembler could be universally considered better than another. I'd suggest Stringtie and Scallop2 (https://www.nature.com/articles/s43588-022-00216-1): the first for specificity (with a bit of parameters tuning) and the latter for sensitivity. Keep in mind that this comes from my personal experience with both, as respective authors claim for better performances and no recent extensive benchmark exists in literature.

ADD COMMENT • link 2.0 years ago by Shred ★ 1.4k

0

Entering edit mode

Thanks for the Scallop2 paper -- I hadn't seen it before; definitely worth checking out.

ADD REPLY • link 2.0 years ago by dsull ★ 5.8k

score 0 · Answer 2 · 2022-04-07

0

Entering edit mode

2.0 years ago

dsull ★ 5.8k

https://doi.org/10.1371/journal.pone.0232946

ADD COMMENT • link 2.0 years ago by dsull ★ 5.8k

0

Entering edit mode

An assembler built in Python? Looks like a PoC rather than a wide usable tool.

ADD REPLY • link 2.0 years ago by Shred ★ 1.4k

2

Entering edit mode

Trinity is written in Perl; some of the earlier assembly software from the Broad were written in Java.

From the paper: "RefShannon is overall faster than guided Trinity, Cufflinks, and Ryuto (for large dataset and more processes). RefShannon consumes more memory compared to other assemblers (except guided Trinity which essentially conducts de novo assembly). This could be because RefShannon is written in Python and memory sharing is less efficient especially for multiprocessing. Currently, a typical lab server with at least 20 CPU cores and over 200GB memory would be sufficient to run RefShannon on large real datasets. One of our future direction is to further improve its computational efficiency."

ADD REPLY • link 2.0 years ago by dsull ★ 5.8k

0

Entering edit mode

Both Perl and Java have better memory managment than Python (considering well writted code). Quotes from the paper suggest that required computational resources are actually a strong limitation for the tool.

a typical lab server with at least 20 CPU cores and over 200GB memory would be sufficient to run RefShannon on large real datasets

This may be true for US institutions. Besides that, the implemented algorithm seems interesting: L. Patcher between authors is always a warranty.

ADD REPLY • link 2.0 years ago by Shred ★ 1.4k