For which species have the genomes of the most individuals been sequenced
3
0
Entering edit mode
7.1 years ago
shafferpr • 0

I want to look at the genomes from many individual members of a species (or closely related species), and I don't particularly care which species. I'm wondering if there is a database somewhere where I could find a large collection of genomes from many individuals. Thanks for your help.

genome sequence • 1.3k views
ADD COMMENT
1
Entering edit mode

You should probably care a little about species.

Single/multi-cellular?

Haploid/diploid/polyploid?

Clonal/inbred/outbred?

Genome size? (Smallest 580 kb, largest 150 000 000 kb)

etc

ADD REPLY
2
Entering edit mode
ADD COMMENT
2
Entering edit mode
7.1 years ago

Absolutely some bacteria!

Data source: ftp://ftp.ncbi.nih.gov/genomes/genbank/bacteria/assembly_summary.txt , ftp URLs contained.

Let's count the top 20:

$  cut -f 8 assembly_summary.txt | cut -d " " -f 1,2 \
        | csvtk freq -n -r | head -n 20 | csvtk pretty
organism_name                frequency
Staphylococcus aureus        7441
Streptococcus pneumoniae     7257
Salmonella enterica          6903
Escherichia coli             5385
Mycobacterium tuberculosis   5088
Pseudomonas aeruginosa       2230
Acinetobacter baumannii      1947
Klebsiella pneumoniae        1798
Mycobacterium abscessus      1375
Listeria monocytogenes       1324
Shigella sonnei              958
Streptococcus suis           955
Clostridioides difficile     901
Campylobacter jejuni         899
Streptococcus agalactiae     867
Campylobacter coli           802
Neisseria meningitidis       790
Vibrio parahaemolyticus      685
Helicobacter pylori          659

csvtk is here.

ADD COMMENT
0
Entering edit mode

Let's count these with complete genome

$ grep 'Complete Genome' assembly_summary.txt | cut -f 8 | cut -d " " -f 1,2 \
        | csvtk freq -H -n -r | head -n 20 | csvtk pretty   
Escherichia coli                     306
Bordetella pertussis                 291
Salmonella enterica                  260
Staphylococcus aureus                145
Campylobacter jejuni                 113
Klebsiella pneumoniae                108
Listeria monocytogenes               95
Helicobacter pylori                  85
Pseudomonas aeruginosa               80
Neisseria meningitidis               76
Chlamydia trachomatis                68
Legionella pneumophila               62
Acinetobacter baumannii              59
Burkholderia pseudomallei            59
Corynebacterium pseudotuberculosis   59
Mycobacterium tuberculosis           52
Bacillus subtilis                    50
Streptococcus pyogenes               50
Bacillus anthracis                   43
Bacillus cereus                      36
ADD REPLY
1
Entering edit mode
7.1 years ago

The most sequenced genome is probably the PhiX genome, since it's used as control DNA by Illumina sequencers. Second in line is probably human, for which you can find quite a lot of data, for example in the 1000 genomes project.

ADD COMMENT

Login before adding your answer.

Traffic: 2817 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6