Question

finding repeats in de-novo assembled contig from PacBio reads

1

Entering edit mode

7.3 years ago

aindap ▴ 120

Dear BioStars Community:

I performed a de-novo assembly of PacBio reads using Canu for a viral genome. I have my resulting unitig from the canu pipeline. I am now interested in characterizing repeats in my resulting assembly. I'm new to assembly and repeat identification. One simple approach was taking the reads used to form the assembly, align them against the assembly with MUMmer, and take a look at the resulting dot plot? Are there any more sophisticated approaches that would yield better results?

Assembly PacBio repeat • 2.6k views

ADD COMMENT • link updated 7.2 years ago by arnstrm ★ 1.8k • written 7.3 years ago by aindap ▴ 120

2

Entering edit mode

why you do not use Tandem Repeat Finder or RepeatMasker to do this?

ADD REPLY • link 7.3 years ago by reza ▴ 300

1

Entering edit mode

Assuming the assembly is correct, it seems to make more sense to align the assembly to itself rather than aligning reads to the assembly.

ADD REPLY • link 7.2 years ago by Brian Bushnell 20k

score 1 · Answer 1 · 2017-03-03

Yes, you should probably try RepeatModeler, which can detect repeat families (de novo) and classify them. It has worked well for both model/non-model species and is very easy to run (it does have few dependencies to install though: TRF, RECON, RepeatScount, NSEG).

EDIT: you can find my sample run script here!