Assembling a genome
1
0
Entering edit mode
7.1 years ago
glady ▴ 320

hello, I have an bacterial genome with an average read length of 100bp, sequenced from Illumina platform. I want to assemble this genome. It would be really nice if you would help me with some queries... Will SPADES be a usefull assembler for such low read length ? How can I select an reference genome for this bacterial genome ? How can I calculate the coverage of the genome ?

Thank you.

sequencing • 1.7k views
ADD COMMENT
1
Entering edit mode

Please have look first here: Best software to assemble bacterial genomes

ADD REPLY
0
Entering edit mode

SPAdes doesn't take a reference as its a de novo assembler, though you could provide a file of trusted contigs if you wanted to. You might however want to check read quality and coverage in which case you may well need a reference genome. As for "how" you choose it, you simply download the genome sequence you expect to be closest to your strain of your bacteria. E.g. if you had sequenced the common lab E. coli strain DH5a, you could just download the genome from NCBI and align your reads against it to find out where your E. coli sequence is different. If your bacteria has never been sequenced before though, you can't get a reference genome for it (obviously). Qualimap is my favourite tool for estimating genome coverage and assembly stats etc, but it will require you to create a .bam file first, so you'll need to use a read aligner like bwa or bowtie2 etc.

ADD REPLY
0
Entering edit mode

I have done its assembly by MIRA and that too by denovo, but I am having a problem over there. After running the cmd -> "mira manifest.conf >&log_assembly.txt", I am not getting any results/contigs files in the projectname_d_results.

Where I am going wrong ? please help ...

ADD REPLY
0
Entering edit mode

Is MIRA giving you any error messages? Those can usually help in figuring out what may be going wrong.

ADD REPLY
0
Entering edit mode

It is not giving any error message

ADD REPLY
0
Entering edit mode
7.1 years ago
h.mon 35k

With MiSeq data (2x250 or 2x300), probably the best assembly you will get is A5_MiSeq. I believe, though I have not tested, it will do a good job with shorter reads. Its log output is really rich in information, including final assembly average coverage.

SPAdes will do a fine job as well.

A google search on "genome coverage calculator" would lead you to this page...

ADD COMMENT
0
Entering edit mode

My data is sequenced with HiSeq(2x100), I tried out SPADES but the number of contigs I am getting in beyond 4000 (with default kmer). Hence, I am trying out MIRA.

ADD REPLY

Login before adding your answer.

Traffic: 2763 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6