Circular Genome??
9
6
Entering edit mode
11.2 years ago
Bdv ▴ 320

hi, we have sequenced a viral genome, and assembled it with 454 newbler. How can I know whether the genome is circular or linear? Should it be part of the assembly software features ( but there is no such feature) or should I use an external software? Thanks alot!

genome assembly • 17k views
9
Entering edit mode
11.2 years ago
lexnederbragt ★ 1.3k

Newbler is not reporting circularity, but it looks like you can find out about whether a contig is circular from its output:

We assembled a bacterial genome using newbler (shotgun and paired end reads), and it showed a small plasmid. I checked the 454ReadStatus.txt file, and it showed a number of shotgun (!) reads that were aligned with the start in the first few hundred bases, and the end in the last few (orientation '-' and '+', respectively). We also found the reverse.

I guess you can take this as an indication for circularity.

A second option would be 'make the contig circular', and cut the contig sequence in two at another position. Then, map the reads back to your contig and looks for reads mapping perfectly over the original 'breakpoint' (hope this makes sense).

1
Entering edit mode

If you like my answer, vote it up :-) + orientation means the read is located on the positive (forward) strand, - on the negative (reverse) strand. Looks like your two different assemblies indeed happen to 'break' the circle at a different position.

0
Entering edit mode

Hi Lex, I tried it and indeed there are reads aligned with the start and also the end of the contig!!! moreover, in the new version of the software I received a different contig(only the order of the 2 parts differs) so does it mean that the newbler just breaks the circular contig somewhere in the middle? and also, what does it mean the orientation is - and + respectively how did you find the reverse? THANKS!!!

0
Entering edit mode

I have 20 such "circular reads" and all of them are 5' - and 3'+. Does it makw sense? Why do they begin on the positive strand and end on the negative one? And I voted your answer up, it was very helpful!!

5
Entering edit mode
11.2 years ago

If your assembler is not aware of circularity, it will probably split the genome arbitrarily at some point, in order to present a report as if it were linear. (I haven't had the opportunity to use Newbler, so I don't know how it treats its results.)

If the genome is circular, you should see some apparently badly oriented, but otherwise well mapped, read-pairs, pointing "away" from each other at the ends of the linear assembly. For each pair, the sum of their distances to the nearest contig end should each be similar to the expected insert size.

You may also find the "joining part" of the same circular sequence represented in a separate contig, so it's worth checking specifically for that.

5
Entering edit mode
11.2 years ago
Peter 6.0k

As far as I am aware, no assembly programs produce explicitly labeled circular contigs on output, even though this would be useful for some virus, many bacteria, plasmids, mitochondria, chloroplasts etc. In practice this is not usually an issue - you are unlikely to get enough nice data for a whole circular bacterial genome to come out as one contig. For those of us interested in viral genomes or mitochondria etc it is annoying, but the sequences are small enough to manually finish.

The other posters have suggested several ideas to help you manually stitch the ends together. I would add that with one I have had an apparently circular 40kb viral genome assembly of 454 reads of come out as a linear contig of about 50kb - it had actually started repeating! Something else to check.

Finally, and probably most crucially - talk to some virologists! Some virus can form a circular form for replication, but a linear form for bundling up into viral particles. In this situation you may have to do some lab work to work out where the ends really are.

0
Entering edit mode

It is a novel viral genome, so the virologists don't know (but want to know) whether it is circular.We do have one contig only. I don't understand how the assembler works if it is circular: does it cut the contig in the middle?

0
Entering edit mode

Different assemblers will do it differently, and it will depend a bit on your data and how variable it is. I think the best you can hope for is a linear contig which is the full length of the circle, perhaps with some overlap as I described (e.g. 50kb contig for a 40kb circle). The circle break point will probably be random (assuming you have good coverage - otherwise I would expect it to break at a region of low coverage). You will need to do some manual finishing, and should try Sanger/capillary sequencing over the end gap to confirm the ends do meet.

0
Entering edit mode

The repeat of some 10 kbp is certainly a sign that something is up, perhaps a circular genome. So, I'd run the assembly through Miropeats to identify those repeats, duplications or inversions. See http://genome.wustl.edu/software/miropeats

4
Entering edit mode
11.1 years ago
Ketil 4.1k

Very simple approach to check if a contig is circular:

1. align all reads to the contig, pick only the best hit for each read
2. concatenate the contig with itself, then redo the alignment

If some reads now align with a (much) better score, it is likely that the contig is circular.

0
Entering edit mode

Sorry, but can you tell me how to do this? Thaks a lot

2
Entering edit mode
11.2 years ago

I your program produced a full linear sequence, I would simply look (a simple grep ?) for some reads starting with the end of the assembly and ending with the beginning of the assembly.

0
Entering edit mode

Correct me if I'm wrong, but in order for this to work you need to be sure to use paired end reads only. In the case of 454 the 2 ends would be saved in one read, but with e.g. Illumina you would have to use /1 /2 sequences explicitly.

0
Entering edit mode

it is not a paired end but a shotgun sequencing...

0
Entering edit mode

Then it's only going to work if there are no sequencing gaps.

2
Entering edit mode
11.2 years ago

Here are my thoughts on this:

As you mentioned, you used shotgun-only 454 sequencing. Assuming that your viral genome is (almost) entirely sequenced:

1. There are no sequencing gaps, i.e. your genome did not contain segments where 454 sequencing failed and thus there are reads covering the entire genome. In this case, it would be best if you had only one contig the ends of which you could try to join by finding a read that spans the start and the end (taking orientations and complementarity into account). The more contigs you have the more impractical this approach gets and the less likely the assumption (no gaps) is.

2. There are 1 or 2 sequencing gaps. Find primers to try and close the gap with another sequencing method (Sanger comes to mind, as long reads are an advantage here). Again, take orientations into account to join contigs to a circular genome (or not). Again, with more contigs/gaps this gets impractical. You might want to tweak your assembler's options here.

The important thing to note is that it is not possible to distinguish between "no gaps-linear genome" and "one gap-circular genome". In order to be sure, I would try joining the ends together by either sequencing or simple PCR products.

1
Entering edit mode
10.7 years ago
ALchEmiXt ★ 1.9k

Maybe a way too limple tought on this, but it could work:

As mentioned earlier by some answers the assembly will probably break at some point of low coverage. However, you could slightly change your input sequence dataset (for instance delete the reads within a segment of the genome assembly). A reassembly will in this case be pushed towards a defined break at the point where you deleted the sequences. However, you are now able to check whether the first assembly ends are joined or still present as ends in the new assembly. Slightly fiddling around with which reads to delete might give you a good answer on circularity (in addition to for instance the functional annotation which could also give a clue of arbitrary breaking of the assembly or being real (ragged) ends).

My 2ct.

1
Entering edit mode
7.7 years ago
hurfdurf ▴ 490

Geneious has a circular-capable assembler built in now:

http://blog.geneious.com/post/84370864944/building-a-circular-de-novo-assembler

0
Entering edit mode
6.9 years ago
5heikki 10k

I wrote a small script for screening circular contigs from multi fastas.