Question: 454 Gap Closures
1
gravatar for Lee Katz
6.9 years ago by
Lee Katz2.8k
Atlanta, GA
Lee Katz2.8k wrote:

Is there a way to estimate how large a gap might be after performing 454 pyrosequencing followed by Newbler? I have several closed reference genomes, and I know that my read length is about 400bp with about 40x coverage. Therefore I have high confidence that these gaps are due to repeat regions.

edit I guess this might be answerable by knowing the repeat regions in a genome. How would I identify repeat regions just by the sequence alone? If I knew this length then I would take the repeat region length, L and calculate it by gapLength = L-(2 * 400).

assembly • 1.6k views
ADD COMMENTlink modified 15 days ago by Biostar ♦♦ 20 • written 6.9 years ago by Lee Katz2.8k

Is it an eukariotic or bacterial genome?

ADD REPLYlink written 6.9 years ago by Darked894.1k

bacterial genome

ADD REPLYlink written 6.9 years ago by Lee Katz2.8k
3
gravatar for Darked89
6.9 years ago by
Darked894.1k
Barcelona, Spain
Darked894.1k wrote:

Repeat identification:

http://openwetware.org/wiki/Wikiomics:Repeat_finding

Newbler does not kick out all repetitive sequences from assembly, at least not in all settings. I got the repeats in a Newbler-assembled plant genome.

ADD COMMENTlink modified 6.9 years ago • written 6.9 years ago by Darked894.1k
1

For the fire ant genome, we noticed dramatic improvements in newbler performance when we increased parameter stringency to minimum 100bp overlap between reads and 98 or 99% identity. The best explanation I see is that those parameters helped resolve repeats (because of stringent parameters... many old repeats became unique sequences...)

ADD REPLYlink written 6.4 years ago by Yannick Wurm2.2k

Thanks for the info!

ADD REPLYlink written 6.9 years ago by Lee Katz2.8k
1
gravatar for Ketil
6.8 years ago by
Ketil3.8k
Germany
Ketil3.8k wrote:

I think that even with 40x coverage, you're not guaranteed to have reads covering all gaps, and the theoretical models don't work so well in practice. I don't know any exact numbers for this (and it probably varies from run to run and lab to lab), but coverage tends to be uneven, and there could be features of the sequence that makes some parts rare or unsequenceable. It's well known that you get duplicated clones (the same clone on multiple beads), which is one form of unevenness.

ADD COMMENTlink written 6.8 years ago by Ketil3.8k

Newbler takes care of these duplicates: it identifies them, does not remove them, but when t for example deteermines the consensus bases, the duplicates count for one (same for average read depth).

ADD REPLYlink written 6.8 years ago by lexnederbragt1.2k
1
gravatar for lexnederbragt
6.8 years ago by
lexnederbragt1.2k
Oslo, Norway
lexnederbragt1.2k wrote:

I assume you have shotgun reads only? For newbler assemblies, you can actually find the repeats among the contigs by looking at the per-contig read depth. With apologies for the self-promotion, here is a paper describing just that: http://www.hindawi.com/journals/seq/2010/782465.html. Contigs with higher-than-normal read depth are collapsed repeats, and the depth is proportional to the copy number.

This will at least tell you what (contigs) the repeats are. Looking at the 454ContigGraph file could tell you which contigs the 'neighbours' of the repeats are.

ADD COMMENTlink written 6.8 years ago by lexnederbragt1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 839 users visited in the last hour