How to get total count of organisms with whole genome sequenced
3
0
Entering edit mode
7.0 years ago

Hi all,

I checked the NCBI FTP site: ftp://ftp.ncbi.nih.gov/genomes/ Here the no. of organisms reported is approximately 389 (for eukaryotes I guess) and there is separate directory for viruses However, this link: https://www.ncbi.nlm.nih.gov/genome/browse/ shows something 7313 for prokaryotes (if I keep only complete genome) and 35 for eukaryotes (keeping complete genome). and 7150 for viruses. So what data should one report as total number of organisms sequenced till date and submitted to NCBI? If anyone can help me with the number and source (with breakage of Eukaryotes,Prokaryotes and Virusesis is even better).

Thanks all.

Ruchika

ncbi genome organisms • 4.7k views
ADD COMMENT
2
Entering edit mode
7.0 years ago

See The Genomes OnLine Database (GOLD). It is a web-based resource for comprehensive information regarding genome and metagenome sequencing projects, and their associated metadata, around the world.

The database provides all the statistics for various statuses of sequencing projects, such as:

  • complete genomes
  • complete and published genomes
  • permanent drafts
  • incomplete projects
  • abandoned projects
ADD COMMENT
1
Entering edit mode
7.0 years ago
piet ★ 1.8k

The principle problem is how you define the term "complete genome". For most organisms it is still impossible to obtain complete seqences of all replicons in a cell. The outcome of sequencing experiments is limited as well by the method used for nucleic acid extraction as by the sequencing approach.

It will depend on the context of your research how to define "complete genome" appropriately. You should write down you own definition first, and then check whether available genomes fit your definition.

ADD COMMENT
0
Entering edit mode

By complete I mean whole genome sequence has been sequenced. The same way the sequencing projects are termed as complete genomes, short contigs etc. NCBI has the terminology for reference genome for the ones that have been sequenced fully and are curated manually.

Not to confuse you more by complete I mean where the genome has been fully sequenced and reported for public usage. Many thanks.

ADD REPLY
0
Entering edit mode

By complete I mean whole genome sequence has been sequenced.

Human genome has bee sequenced since early 2000's but people are still working on refining it and parts are certainly intractable to sequencing with past/current technologies.

complete I mean where the genome has been fully sequenced and reported for public usage

Then why not take the entire list from NCBI genomes.

ADD REPLY
0
Entering edit mode

Yes, but which value to take is my question as the FTP site has given different values like I mentioned in the main question. If I check through FTP on a given date the number is way different than the link https://www.ncbi.nlm.nih.gov/genome/browse/

That's what I have asked in the main question too. Please guide Thanks

ADD REPLY
1
Entering edit mode
7.0 years ago

I just counted the bacterial species with complete genomes according to the ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/assembly_summary.txt , it's 2517.

$ cat assembly_summary.txt | grep "Complete Genome" | cut -f 7 | sort | uniq | wc -l 
2517
ADD COMMENT
0
Entering edit mode

Similar lines I checked for refseqs $ cat assembly_summary_refseq.txt | grep "Complete Genome" | cut -f 7 | sort | uniq | wc -l 9788

Do you find reporting 9788 organisms would be an authentic data ?

ADD REPLY
0
Entering edit mode

And Genbank file:

$ cat assembly_summary_genbank.txt | grep "Complete Genome" | cut -f 7 | sort | uniq | wc -l 2858 which one to report?

ADD REPLY
0
Entering edit mode

You realize that these numbers are subject to change (perhaps daily). New data is added each night to NCBI/GenBank.

Why not report both with appropriate notes.

ADD REPLY
0
Entering edit mode

I just need to cite one number as total organisms for whole genome sequenced. Citing two different numbers will create chaos.

ADD REPLY
0
Entering edit mode

If you strictly need one number then report 2858 that you came up with above. You will have to qualify that indicating that you are only counting one entry per taxid. Other number that could be reported would be 7412 (which does not take uniq entries for taxid).

Citing two different numbers will create chaos.

Where :)

ADD REPLY

Login before adding your answer.

Traffic: 2351 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6