Question

454 Gap Closures

1

Entering edit mode

13.4 years ago

Lee Katz ★ 3.1k

Is there a way to estimate how large a gap might be after performing 454 pyrosequencing followed by Newbler? I have several closed reference genomes, and I know that my read length is about 400bp with about 40x coverage. Therefore I have high confidence that these gaps are due to repeat regions.

edit I guess this might be answerable by knowing the repeat regions in a genome. How would I identify repeat regions just by the sequence alone? If I knew this length then I would take the repeat region length, L and calculate it by gapLength = L-(2 * 400).

assembly • 3.1k views

ADD COMMENT • link updated 6.6 years ago by Biostar 20 • written 13.4 years ago by Lee Katz ★ 3.1k

0

Entering edit mode

Is it an eukariotic or bacterial genome?

ADD REPLY • link 13.4 years ago by Darked89 4.6k

0

Entering edit mode

bacterial genome

ADD REPLY • link 13.4 years ago by Lee Katz ★ 3.1k

Ram · Answer 1 · 2010-11-19

3

Entering edit mode

13.4 years ago

Darked89 4.6k

Repeat identification:

http://openwetware.org/wiki/Wikiomics:Repeat_finding

Newbler does not kick out all repetitive sequences from assembly, at least not in all settings. I got the repeats in a Newbler-assembled plant genome.

ADD COMMENT • link updated 4.6 years ago by Ram 43k • written 13.4 years ago by Darked89 4.6k

1

Entering edit mode

For the fire ant genome, we noticed dramatic improvements in newbler performance when we increased parameter stringency to minimum 100bp overlap between reads and 98 or 99% identity. The best explanation I see is that those parameters helped resolve repeats (because of stringent parameters... many old repeats became unique sequences...)

ADD REPLY • link 13.0 years ago by Yannick Wurm ★ 2.5k

0

Entering edit mode

Thanks for the info!

ADD REPLY • link 13.4 years ago by Lee Katz ★ 3.1k

score 1 · Answer 2 · 2010-11-20

1

Entering edit mode

13.4 years ago

Ketil 4.1k

I think that even with 40x coverage, you're not guaranteed to have reads covering all gaps, and the theoretical models don't work so well in practice. I don't know any exact numbers for this (and it probably varies from run to run and lab to lab), but coverage tends to be uneven, and there could be features of the sequence that makes some parts rare or unsequenceable. It's well known that you get duplicated clones (the same clone on multiple beads), which is one form of unevenness.

ADD COMMENT • link 13.4 years ago by Ketil 4.1k

0

Entering edit mode

Newbler takes care of these duplicates: it identifies them, does not remove them, but when t for example deteermines the consensus bases, the duplicates count for one (same for average read depth).

ADD REPLY • link 13.4 years ago by lexnederbragt ★ 1.3k

score 1 · Answer 3 · 2010-11-22

I assume you have shotgun reads only? For newbler assemblies, you can actually find the repeats among the contigs by looking at the per-contig read depth. With apologies for the self-promotion, here is a paper describing just that: http://www.hindawi.com/journals/seq/2010/782465.html. Contigs with higher-than-normal read depth are collapsed repeats, and the depth is proportional to the copy number.

This will at least tell you what (contigs) the repeats are. Looking at the 454ContigGraph file could tell you which contigs the 'neighbours' of the repeats are.