Question: Circular Genome??
6
gravatar for Bdv
7.6 years ago by
Bdv290
Bdv290 wrote:

hi, we have sequenced a viral genome, and assembled it with 454 newbler. How can I know whether the genome is circular or linear? Should it be part of the assembly software features ( but there is no such feature) or should I use an external software? Thanks alot!

genome assembly • 12k views
ADD COMMENTlink modified 3.3 years ago by 5heikki7.7k • written 7.6 years ago by Bdv290
9
gravatar for lexnederbragt
7.6 years ago by
lexnederbragt1.2k
Oslo, Norway
lexnederbragt1.2k wrote:

Almost added this as a comment to Pierre's answer...

Newbler is not reporting circularity, but it looks like you can find out about whether a contig is circular from its output:

We assembled a bacterial genome using newbler (shotgun and paired end reads), and it showed a small plasmid. I checked the 454ReadStatus.txt file, and it showed a number of shotgun (!) reads that were aligned with the start in the first few hundred bases, and the end in the last few (orientation '-' and '+', respectively). We also found the reverse.

I guess you can take this as an indication for circularity.

A second option would be 'make the contig circular', and cut the contig sequence in two at another position. Then, map the reads back to your contig and looks for reads mapping perfectly over the original 'breakpoint' (hope this makes sense).

ADD COMMENTlink written 7.6 years ago by lexnederbragt1.2k
1

If you like my answer, vote it up :-) + orientation means the read is located on the positive (forward) strand, - on the negative (reverse) strand. Looks like your two different assemblies indeed happen to 'break' the circle at a different position.

ADD REPLYlink written 7.6 years ago by lexnederbragt1.2k

Hi Lex, I tried it and indeed there are reads aligned with the start and also the end of the contig!!! moreover, in the new version of the software I received a different contig(only the order of the 2 parts differs) so does it mean that the newbler just breaks the circular contig somewhere in the middle? and also, what does it mean the orientation is - and + respectively how did you find the reverse? THANKS!!!

ADD REPLYlink written 7.6 years ago by Bdv290

I have 20 such "circular reads" and all of them are 5' - and 3'+. Does it makw sense? Why do they begin on the positive strand and end on the negative one? And I voted your answer up, it was very helpful!!

ADD REPLYlink written 7.6 years ago by Bdv290
5
gravatar for iw9oel_ad
7.6 years ago by
iw9oel_ad6.0k
iw9oel_ad6.0k wrote:

If your assembler is not aware of circularity, it will probably split the genome arbitrarily at some point, in order to present a report as if it were linear. (I haven't had the opportunity to use Newbler, so I don't know how it treats its results.)

If the genome is circular, you should see some apparently badly oriented, but otherwise well mapped, read-pairs, pointing "away" from each other at the ends of the linear assembly. For each pair, the sum of their distances to the nearest contig end should each be similar to the expected insert size.

You may also find the "joining part" of the same circular sequence represented in a separate contig, so it's worth checking specifically for that.

ADD COMMENTlink written 7.6 years ago by iw9oel_ad6.0k
5
gravatar for Peter
7.6 years ago by
Peter5.6k
Scotland, UK
Peter5.6k wrote:

As far as I am aware, no assembly programs produce explicitly labeled circular contigs on output, even though this would be useful for some virus, many bacteria, plasmids, mitochondria, chloroplasts etc. In practice this is not usually an issue - you are unlikely to get enough nice data for a whole circular bacterial genome to come out as one contig. For those of us interested in viral genomes or mitochondria etc it is annoying, but the sequences are small enough to manually finish.

The other posters have suggested several ideas to help you manually stitch the ends together. I would add that with one I have had an apparently circular 40kb viral genome assembly of 454 reads of come out as a linear contig of about 50kb - it had actually started repeating! Something else to check.

Finally, and probably most crucially - talk to some virologists! Some virus can form a circular form for replication, but a linear form for bundling up into viral particles. In this situation you may have to do some lab work to work out where the ends really are.

ADD COMMENTlink written 7.6 years ago by Peter5.6k

It is a novel viral genome, so the virologists don't know (but want to know) whether it is circular.We do have one contig only. I don't understand how the assembler works if it is circular: does it cut the contig in the middle?

ADD REPLYlink written 7.6 years ago by Bdv290

Different assemblers will do it differently, and it will depend a bit on your data and how variable it is. I think the best you can hope for is a linear contig which is the full length of the circle, perhaps with some overlap as I described (e.g. 50kb contig for a 40kb circle). The circle break point will probably be random (assuming you have good coverage - otherwise I would expect it to break at a region of low coverage). You will need to do some manual finishing, and should try Sanger/capillary sequencing over the end gap to confirm the ends do meet.

ADD REPLYlink modified 4.1 years ago • written 7.6 years ago by Peter5.6k

The repeat of some 10 kbp is certainly a sign that something is up, perhaps a circular genome. So, I'd run the assembly through Miropeats to identify those repeats, duplications or inversions. See http://genome.wustl.edu/software/miropeats

ADD REPLYlink written 7.5 years ago by Larry_Parnell16k
3
gravatar for Ketil
7.5 years ago by
Ketil3.9k
Germany
Ketil3.9k wrote:

Very simple approach to check if a contig is circular:

  1. align all reads to the contig, pick only the best hit for each read
  2. concatenate the contig with itself, then redo the alignment

If some reads now align with a (much) better score, it is likely that the contig is circular.

ADD COMMENTlink written 7.5 years ago by Ketil3.9k
2
gravatar for Pierre Lindenbaum
7.6 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum112k wrote:

I your program produced a full linear sequence, I would simply look (a simple grep ?) for some reads starting with the end of the assembly and ending with the beginning of the assembly.

ADD COMMENTlink written 7.6 years ago by Pierre Lindenbaum112k

Correct me if I'm wrong, but in order for this to work you need to be sure to use paired end reads only. In the case of 454 the 2 ends would be saved in one read, but with e.g. Illumina you would have to use /1 /2 sequences explicitly.

ADD REPLYlink written 7.6 years ago by Michael Schubert6.8k

it is not a paired end but a shotgun sequencing...

ADD REPLYlink written 7.6 years ago by Bdv290

Then it's only going to work if there are no sequencing gaps.

ADD REPLYlink written 7.6 years ago by Michael Schubert6.8k
2
gravatar for Michael Schubert
7.6 years ago by
Cambridge, UK
Michael Schubert6.8k wrote:

Here are my thoughts on this:

As you mentioned, you used shotgun-only 454 sequencing. Assuming that your viral genome is (almost) entirely sequenced:

  1. There are no sequencing gaps, i.e. your genome did not contain segments where 454 sequencing failed and thus there are reads covering the entire genome. In this case, it would be best if you had only one contig the ends of which you could try to join by finding a read that spans the start and the end (taking orientations and complementarity into account). The more contigs you have the more impractical this approach gets and the less likely the assumption (no gaps) is.

  2. There are 1 or 2 sequencing gaps. Find primers to try and close the gap with another sequencing method (Sanger comes to mind, as long reads are an advantage here). Again, take orientations into account to join contigs to a circular genome (or not). Again, with more contigs/gaps this gets impractical. You might want to tweak your assembler's options here.

The important thing to note is that it is not possible to distinguish between "no gaps-linear genome" and "one gap-circular genome". In order to be sure, I would try joining the ends together by either sequencing or simple PCR products.

ADD COMMENTlink written 7.6 years ago by Michael Schubert6.8k
1
gravatar for ALchEmiXt
7.1 years ago by
ALchEmiXt1.9k
The Netherlands
ALchEmiXt1.9k wrote:

Maybe a way too limple tought on this, but it could work:

As mentioned earlier by some answers the assembly will probably break at some point of low coverage. However, you could slightly change your input sequence dataset (for instance delete the reads within a segment of the genome assembly). A reassembly will in this case be pushed towards a defined break at the point where you deleted the sequences. However, you are now able to check whether the first assembly ends are joined or still present as ends in the new assembly. Slightly fiddling around with which reads to delete might give you a good answer on circularity (in addition to for instance the functional annotation which could also give a clue of arbitrary breaking of the assembly or being real (ragged) ends).

My 2ct.

ADD COMMENTlink written 7.1 years ago by ALchEmiXt1.9k
1
gravatar for hurfdurf
4.1 years ago by
hurfdurf460
United States
hurfdurf460 wrote:

Geneious has a circular-capable assembler built in now:

http://blog.geneious.com/post/84370864944/building-a-circular-de-novo-assembler

ADD COMMENTlink written 4.1 years ago by hurfdurf460
0
gravatar for 5heikki
3.3 years ago by
5heikki7.7k
Finland
5heikki7.7k wrote:

I wrote a small script for screening circular contigs from multi fastas.

ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by 5heikki7.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1511 users visited in the last hour