Question

What is a good algorithm for assembly long reads where the intended assembly length is approximately equal to the reads themselves?

0

Entering edit mode

17 months ago

mickalideh • 0

Hi all, here is the issue:

I have set of Pacbio CCS long reads, say 7kb long each, that for the most part totally overlap with each other. Therefore if one were to assemble them, the assembly would not be much longer than the reads themselves.

I believe CANU has a little trouble when this is a case.

Does anyone know a good assembly algorithm for this case?

The question could be reframed as: What is a good algorithm for getting a consensus long read based on an input of mostly overlapping long reads?

pacbio assembly longread • 1.2k views

ADD COMMENT • link 16 months ago by mickalideh • 0

0

Entering edit mode

If the reads are almost as long as the assembly then why not try a multiple/progressive sequence alignment.

ADD REPLY • link 17 months ago by GenoMax 141k

0

Entering edit mode

I am not familiar with this technique, but I do not see how it would lead to the desired result which is a contig that is more accurate than any of the reads individually.

ADD REPLY • link 16 months ago by mickalideh • 0

0

Entering edit mode

So basically you want a Consensus sequence of Consensus sequences? Seems redundant; are you sure you really need assembly at this point?

What kind of data is this, amplicon? plasmid? is it circular?

If you find you still want to assemble, I would start out with hifiasm: https://github.com/chhylp123/hifiasm

ADD REPLY • link 17 months ago by gconcepcion ▴ 410

0

Entering edit mode

These are pacbio hifi reads from human. I am reasonably confident that they call from the same place on the same chromosomal copy.

Each read has a few random errors so the purpose of the assembly is to take the consensus of these reads and thus eliminate the random errors.

Thank you for the recommendation of hifiasm.

ADD REPLY • link 16 months ago by mickalideh • 0