Question

why capt3 tool cannot efficiently find the overlapping contigs?

0

Entering edit mode

9.9 years ago

seta ★ 1.9k

Hi all,

I did transcriptome assembly on illumina PE, 100 bp reads at different k-mers, and now try to merge resulting assemblies. So, I pool them and subjected to cd-hit-est tool to remove redundant sequences; then I used cap3 on non-redundant sequences with default setting. But based on cap3 output, many of sequences are as singlets instead of contigs. It sounds that cap3 could not efficiently find the overlapping sequences. Could you please let me know what's wrong here, or there is any setting to improve the work? Thanks for sharing your experience.

Assembly sequencing RNA-Seq genome • 3.0k views

ADD COMMENT • link updated 2.7 years ago by Ram 45k • written 9.9 years ago by seta ★ 1.9k

Ram · Answer 1 · 2015-07-30

0

Entering edit mode

9.9 years ago

Brian Bushnell 20k

Hi seta,

I suggest that you give Dedupe a try. It will find and remove all duplicate contigs and fully-contained contigs, like this:

dedupe.sh in=assm1.fa,assm2.fa out=combined.fa

It can also find and report all overlaps, in dot format. It won't remove or merge overlapping contigs, though.

ADD COMMENT • link updated 2.7 years ago by Ram 45k • written 9.9 years ago by Brian Bushnell 20k

0

Entering edit mode

Hi Brian, thanks, but how I can use it for my purpose if it don't merge overlapping contigs?

ADD REPLY • link 9.9 years ago by seta ★ 1.9k

0

Entering edit mode

Ah, well, it was not clear what your purpose was. If you specifically want overlapping contigs to get merged, Dedupe is not the correct tool unless you post-process the output. You might try Minimus2.

That said, when we run Minimus2, we always run Dedupe first because it greatly reduces the input volume, which makes Minimus2 take less time and be less likely to crash.

ADD REPLY • link updated 2.7 years ago by Ram 45k • written 9.9 years ago by Brian Bushnell 20k

0

Entering edit mode

thanks, I try it.

ADD REPLY • link 9.9 years ago by seta ★ 1.9k

Ram · Answer 2 · 2015-07-30

0

Entering edit mode

9.9 years ago

h.mon 35k

Some contigs will be assembled only at one particular kmer - low kmers assemble lots of small contigs. These sequences may be missing entirely from assemblies with other kmers, so there is nothing to be done for them and CAP3 shows them as singlets.

ADD COMMENT • link updated 2.7 years ago by Ram 45k • written 9.9 years ago by h.mon 35k