Entering edit mode
3.2 years ago
Rob
▴
170
Hi friends,
I found 7 genome sequences for one strain of E. coli in IMG/M. these sequences are different in Genome Size and Gene Count. what can be reason for these differences as they are all for one strain?
Which strain would that be? They may have different level of completeness. Some of them may include plasmids as well.
Thanks Mensur for responding. This is E.coli K12 MG1655 .
A couple of these assemblies are incomplete, so it is easy to understand why they are different. For the rest, it would have to be something in annotation procedures, because they shouldn't be that different. I would download them and annotate in an uniform way, and I suspect they wouldn't be so different. If they still are, it is a valid question whether these are in reality the same strain. One presumes that they are, but strain differences would be the easiest biological explanation for why the assemblies are different.
literally anything. Library preparation, sequencing technology and the assembly strategy. Do you have a link for each of the six?
thank you Andres What link do you mean? I have this screenshot of my result from IMG/M
https://ibb.co/Tb0YK3m
Escherichia coli str. K-12 substr. MG1655star (E. coli) (University of Oklahoma): Sequencing technology 454, Assembler Newbler, cov 75X link
Escherichia coli K-12 subMG1655 (Pacific Biosciences): Sequencing technology PacBio, Assembler Celera, cov 99X link
Escherichia coli K-12 subMG1655 (The Genome Institute at Washington University): Sequencing technology Illumina, Assembler Velvet, cov 70X link
Escherichia coli K12- MG1655 (Broad Institute): Sequencing technology Sanger-Illumina, Assembler manual processing(?????), cov NA link
Escherichia coli K12- MG1655 (Broad Institute): Sequencing technology PacBio-Illumina, Assembler AllPath, cov 92 link
Escherichia coli K12- MG1655 (Weill Cornell Medical College): Sequencing technology PacBio RS, Assembler NA, cov NA
I would say that the differences in the genome size are caused by the different combination of assemblers and sequencign technologies. Same for the gene count, different annotation algorithms give you different results.
Thanks for great answer. what biological reasons can be the driver of differences?