Question: E coli Sequencing & Analysis
gravatar for Alex Gibbs
15 months ago by
Alex Gibbs40
Cardiff University
Alex Gibbs40 wrote:


I have been given the task of assembling a 'new' Ecoli genome and analysing the genes present etc.

The Ecoli is a new strain, and has been taken and run on a Nextseq 500 in high-output mode with 150bp paired end reads. The 'raw' files that I have is the forward and reverse reads.

I have initially QC checked the 'raw' files, and subsequently run them through trim galore and checked the QC after that.

For the next step, I now need to assemble my genome. I have been told that SPades will run a 'de novo' assembly for me, and then put that assembly into Prokka for Gene annotation.

Is this the best way to assemble the genome and annotate it? Or should I use another method? I am thinking that I should use a 'mapping' technique to assemble the genome using the Ecoli O157:H7 genome as a reference, but I have no idea how to do this. I would say that I am at an intermediate level with unix, but by no means am I a bioinformatician. Some help and guidance would be greatly appreciated!



ADD COMMENTlink modified 15 months ago by rebeccarenberg20 • written 15 months ago by Alex Gibbs40

SPAdes + Prokka is pretty much the de facto standard these days. There's little need to deviate unless you have very specific reasons.

You might gain improvements using reference guided assemblers such as Mira if your strains are very close, but don't map your reads first, you'd just be discarding data for no reason, instead let Mira (or whatever) decide that for you. E. coli in particular is known for its divergence, so a de novo assembly via SPAdes or similar is probably best - at least for a first pass.

The most compelling alternative assembly/annotation pipeline I can think of would be SKESA and PGAP which are NCBI's tools. If you uploaded the data to them, that's the assembly you'd get back so that can be useful.

ADD REPLYlink written 15 months ago by Joe16k

Thank you very much for your reply! I think I will try both methods (SPades&Prokka, NCBI tools) and see which ones I get on with most.

ADD REPLYlink written 15 months ago by Alex Gibbs40
gravatar for rebeccarenberg
15 months ago by
United States
rebeccarenberg20 wrote:

I would also suggest using Unicycler ( which uses SPAdes, but is specifically designed for bacterial genomes and is superior to using SPAdes alone (more info on the github page).

I have found it VERY user friendly (I am also NOT a bioinformatician) and gives really good results.

ADD COMMENTlink written 15 months ago by rebeccarenberg20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 813 users visited in the last hour