Entering edit mode
4.0 years ago
Dunois
★
2.9k
I've been sweating over how best to reduce the redundancy in a de novo transcriptome assembled using Trinity
. I found this tool called Compacta
(here's the GitHub
repository) that seems to be aimed at solving precisely this problem.
Can anyone comment/share their thoughts on this tool? Has anybody used it before? I haven't really found any papers citing it (yet).
Have you tried
CD-HIT
(LINK) for this application?I'm looking at
Compacta
because I'd rather not useCD-HIT
orMMseqs2
if I can.You can try it out and let us know :-)
Good to know about it though. Thanks for posting the link.
Well since you mentioned it: I did try it out.
Compacta
is easy to install and run, and the paper is very well written (some very important details are strewn all across the paper, its supplement, and a bunch of READMEs relevant to the paper's scripts though). TheGitHub
repo seems to no longer be maintained but is still accessible.Only problem is that it seems to have some problems with
samtools
being unable to parse someBAM
header elements correctly. I still haven't gotten past this yet.But their idea is solid, and I'd give it a go.
That does not sound like
compacta
's problem though? Unfortunately if the software is not maintained or is tied to some specific version ofsamtools
then I would not use it. No matter how good it may be.Indeed, I don't think it's
Compacta
itself. I haven't had the chance to debug it properly; some of the samples I ran through it did run fine, and yet some others didn't. And yeah, it's a shame that it's not being actively maintained, despite there being some interest in the tool. Let's hope somebody forks it.