Question: why capt3 tool cannot efficiently find the overlapping contigs?
gravatar for seta
3.9 years ago by
seta1.2k wrote:

Hi all,

I did transcriptome assembly on illumina PE, 100 bp reads at different k-mers, and now try to merge resulting assemblies. So, I pool them and subjected to cd-hit-est tool to remove redundant sequences; then I used cap3 on non-redundant sequences with default setting. But based on cap3 output, many of sequences are as singlets instead of contigs. It sounds that cap3 could not efficiently find the overlapping sequences. Could you please let me know what's wrong here, or there is any setting to improve the work? Thanks for sharing your experience.

ADD COMMENTlink modified 3.8 years ago by Biostar ♦♦ 20 • written 3.9 years ago by seta1.2k
gravatar for Brian Bushnell
3.9 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

Hi seta,

I suggest that you give Dedupe a try.  It will find and remove all duplicate contigs and fully-contained contigs, like this: in=assm1.fa,assm2.fa out=combined.fa

It can also find and report all overlaps, in dot format.  It won't remove or merge overlapping contigs, though.

ADD COMMENTlink written 3.9 years ago by Brian Bushnell16k

Hi Brian, thanks, but how I can use it for my purpose if it don't merge overlapping contigs?

ADD REPLYlink written 3.9 years ago by seta1.2k

Ah, well, it was not clear what your purpose was.  If you specifically want overlapping contigs to get merged, Dedupe is not the correct tool unless you postprocess the output.  You might try Minimus2.

That said, when we run Minimus2, we always run Dedupe first because it greatly reduces the input volume, which makes Minimus2 take less time and be less likely to crash.

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by Brian Bushnell16k

thanks, I try it.

ADD REPLYlink written 3.9 years ago by seta1.2k
gravatar for h.mon
3.9 years ago by
h.mon26k wrote:

Some contigs will be assembled only at one particular kmer - low kmers assemble lots of small contigs. These sequences may be missing entirely from assemblies with other kmers, so there is nothing to be done for them and CAP3 shows them as singlets.

ADD COMMENTlink written 3.9 years ago by h.mon26k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1587 users visited in the last hour