Question: Assembling long similar contigs
0
gravatar for dcdanko
3.8 years ago by
dcdanko0
dcdanko0 wrote:

I have a program which outputs ~50 assembled transcripts which are about 10k base pairs long each.

My program already filters exact duplicate sequences but many of the assembled transcripts are very similar to one another.

Is there any existing assembly program which can connect sequences that are identical over 90% of their length?

rna-seq assembly • 833 views
ADD COMMENTlink modified 3.8 years ago by arnstrm1.8k • written 3.8 years ago by dcdanko0
1

Have you tried CD-HIT? It can be used for clustering and comparing protein or nucleotide sequences.

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by arnstrm1.8k
0
gravatar for Brian Bushnell
3.8 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

Dedupe from the BBMap package can remove similar sequences to leave only a single copy.

dedupe.sh in=transcripts.fa out=deduped.fa minidentity=0.9 maxedits=20

ADD COMMENTlink written 3.8 years ago by Brian Bushnell17k
0
gravatar for arnstrm
3.8 years ago by
arnstrm1.8k
Ames, IA
arnstrm1.8k wrote:

Also, another alternative: TACO

ADD COMMENTlink written 3.8 years ago by arnstrm1.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1385 users visited in the last hour