Question: E. coli K12 MG1655 datasets from Illumina
gravatar for midox
2.8 years ago by
midox190 wrote:


I want to have the E.coli K12 MG1655 reads generated by Illumina sequencer.
can give me a link to download the data?
I found in the Illumina website but I do not know how to download.

Thank you

ADD COMMENTlink modified 9 months ago by Casey Bergman17k • written 2.8 years ago by midox190
gravatar for Casey Bergman
9 months ago by
Casey Bergman17k
Athens, GA, USA
Casey Bergman17k wrote:

If you are looking for and Illumina dataset for E. coli K12 strain MG1655 that can be downloaded from a public server (i.e. without a BaseSpace account), and has stable INSDC accession numbers, I recommend using ERX008638 is GAIIx 2x100bp run derived from the ATCC MG1655 type strain and was generated by Illumina in late 2010. The NCBI SRA abstract for ERX008638 states:

E. coli K-12 holds a key position as a model organism in studies of molecular biology, biochemistry, genetics and biotechnology. A high-quality reference sequence of the genome of E. coli K-12 strain MG1655 has previously been generated using the Sanger dideoxy sequencing method (Reference sequence accession: NC_000913). We have re-sequenced the genome of the MG1655 strain using the Illumina Genome Analyzer IIx to provide a paired-end, 100 base sequence dataset for development of de novo sequence assembly algorithms or other downstream analysis tools. We generated a sequencing library with a median insert size of 500 bp following random fragmentation and gel fractionation of genomic DNA. Sequence reads were aligned to the reference genome using ELANDv2e. This study supersedes experiment ERX002508 and run ERR008613 under study ERP000092.

The MG1655 referred to by Adrian Pelin above that is available from the Spades developer's website corresponds to This dataset was originally created by Illumina and submitted to EBI in early 2010, and the tar.gz file provided by the submitter is corrupted and therefore the ENA submission is incomplete (i.e. fastq files are not available). This is probably why the Spades developers provide their own version of the files. Downloads from the Spades site are slow, and the lack of stable IDs and checksums make use of the files from the Spades website less than ideal. The NCBI SRA submission for ERX002508 appears to be valid (I have not thoroughly checked this), but downloads from SRA have the problem of conversion from SRA format to fastq. Furthermore, as noted in the NCBI SRA abstract for ERX008638 above, Illumina say that ERX008638 supercedes ERX002508.

It also appears that ERX008638 is preferred by most people in the community: 11 Open Access papers have been published citing ERX008638 (, in contrast to 3 for ERX002508 (

ADD COMMENTlink written 9 months ago by Casey Bergman17k
gravatar for Adrian Pelin
2.8 years ago by
Adrian Pelin2.1k
Adrian Pelin2.1k wrote:

I know the SPAdes guys offer it here:

6.2Gb, 28M reads, 2x100bp, insert size ~ 215bp (Illumina Genome Analyzer IIx). E. coli K-12 MG1655 reference length is 4639675 bp with 4324 annotated genes.

ADD COMMENTlink written 2.8 years ago by Adrian Pelin2.1k
gravatar for midox
2.8 years ago by
midox190 wrote:

i found this from BaseSpace
are they same or not?


ADD COMMENTlink written 2.8 years ago by midox190
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 981 users visited in the last hour