Question: Assembling a genome
gravatar for glady
3.0 years ago by
glady260 wrote:

hello, I have an bacterial genome with an average read length of 100bp, sequenced from Illumina platform. I want to assemble this genome. It would be really nice if you would help me with some queries... Will SPADES be a usefull assembler for such low read length ? How can I select an reference genome for this bacterial genome ? How can I calculate the coverage of the genome ?

Thank you.

sequencing • 893 views
ADD COMMENTlink modified 3.0 years ago by h.mon29k • written 3.0 years ago by glady260

Please have look first here: Best software to assemble bacterial genomes

ADD REPLYlink written 3.0 years ago by Medhat8.6k

SPAdes doesn't take a reference as its a de novo assembler, though you could provide a file of trusted contigs if you wanted to. You might however want to check read quality and coverage in which case you may well need a reference genome. As for "how" you choose it, you simply download the genome sequence you expect to be closest to your strain of your bacteria. E.g. if you had sequenced the common lab E. coli strain DH5a, you could just download the genome from NCBI and align your reads against it to find out where your E. coli sequence is different. If your bacteria has never been sequenced before though, you can't get a reference genome for it (obviously). Qualimap is my favourite tool for estimating genome coverage and assembly stats etc, but it will require you to create a .bam file first, so you'll need to use a read aligner like bwa or bowtie2 etc.

ADD REPLYlink written 3.0 years ago by Joe16k

I have done its assembly by MIRA and that too by denovo, but I am having a problem over there. After running the cmd -> "mira manifest.conf >&log_assembly.txt", I am not getting any results/contigs files in the projectname_d_results.

Where I am going wrong ? please help ...

ADD REPLYlink written 3.0 years ago by glady260

Is MIRA giving you any error messages? Those can usually help in figuring out what may be going wrong.

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by mastal5112.0k

It is not giving any error message

ADD REPLYlink written 3.0 years ago by glady260
gravatar for h.mon
3.0 years ago by
h.mon29k wrote:

With MiSeq data (2x250 or 2x300), probably the best assembly you will get is A5_MiSeq. I believe, though I have not tested, it will do a good job with shorter reads. Its log output is really rich in information, including final assembly average coverage.

SPAdes will do a fine job as well.

A google search on "genome coverage calculator" would lead you to this page...

ADD COMMENTlink written 3.0 years ago by h.mon29k

My data is sequenced with HiSeq(2x100), I tried out SPADES but the number of contigs I am getting in beyond 4000 (with default kmer). Hence, I am trying out MIRA.

ADD REPLYlink written 3.0 years ago by glady260
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 926 users visited in the last hour