genome sequence
0
0
Entering edit mode
3.2 years ago
Rob ▴ 170

Hi friends,

I found 7 genome sequences for one strain of E. coli in IMG/M. these sequences are different in Genome Size and Gene Count. what can be reason for these differences as they are all for one strain?

bacterial genome sequence • 1.4k views
ADD COMMENT
1
Entering edit mode

Which strain would that be? They may have different level of completeness. Some of them may include plasmids as well.

ADD REPLY
0
Entering edit mode

Thanks Mensur for responding. This is E.coli K12 MG1655 .

ADD REPLY
0
Entering edit mode

A couple of these assemblies are incomplete, so it is easy to understand why they are different. For the rest, it would have to be something in annotation procedures, because they shouldn't be that different. I would download them and annotate in an uniform way, and I suspect they wouldn't be so different. If they still are, it is a valid question whether these are in reality the same strain. One presumes that they are, but strain differences would be the easiest biological explanation for why the assemblies are different.

ADD REPLY
0
Entering edit mode

literally anything. Library preparation, sequencing technology and the assembly strategy. Do you have a link for each of the six?

ADD REPLY
0
Entering edit mode

thank you Andres What link do you mean? I have this screenshot of my result from IMG/M https://ibb.co/Tb0YK3m

ADD REPLY
0
Entering edit mode
  1. Escherichia coli str. K-12 substr. MG1655star (E. coli) (University of Oklahoma): Sequencing technology 454, Assembler Newbler, cov 75X link

  2. Escherichia coli K-12 subMG1655 (Pacific Biosciences): Sequencing technology PacBio, Assembler Celera, cov 99X link

  3. Escherichia coli K-12 subMG1655 (The Genome Institute at Washington University): Sequencing technology Illumina, Assembler Velvet, cov 70X link

  4. Escherichia coli K12- MG1655 (Broad Institute): Sequencing technology Sanger-Illumina, Assembler manual processing(?????), cov NA link

  5. Escherichia coli K12- MG1655 (Broad Institute): Sequencing technology PacBio-Illumina, Assembler AllPath, cov 92 link

  6. Escherichia coli K12- MG1655 (Weill Cornell Medical College): Sequencing technology PacBio RS, Assembler NA, cov NA

I would say that the differences in the genome size are caused by the different combination of assemblers and sequencign technologies. Same for the gene count, different annotation algorithms give you different results.

ADD REPLY
0
Entering edit mode

Thanks for great answer. what biological reasons can be the driver of differences?

ADD REPLY

Login before adding your answer.

Traffic: 2564 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6