NCBI Reference Genomes
2
0
Entering edit mode
12 days ago
anna ▴ 70

I came across a list of criteria that NCBI uses to select reference genomes, which includes CheckM completeness among other factors, but I couldn't find whether there is a defined cutoff value that an assembly must meet to be considered eligible.

Does anyone know if NCBI uses a specific threshold for CheckM completeness for a reference genome selection? Or is it purely comparative across available assemblies for a given species?

genomes ncbi • 442 views
ADD COMMENT
1
Entering edit mode
12 days ago
dthorbur ★ 3.0k

It appears there is no strict threshold, but it depends on how many other assemblies for a given taxa are already submitted. I think their logic is it's okay to accept a lower quality genome if it's covering gaps in taxonomy until better ones come along.

In their prokaryotic release notes, this is what it says:

Added CheckM completeness cut-offs to validate annotation. An annotated assembly will only be added to the RefSeq collection if it meets the following criteria:
For species with more than 1000 assemblies in RefSeq, the completeness is higher than the species Average Completeness - 3 times the standard deviation . For species with 10-1000 assemblies in RefSeq, the completeness is higher than the smaller of 90% or the species Average Completeness - 3 times the standard deviation . No CheckM cutoff is applied if there are less than 10 assemblies in the species.

And with their documentation on selecting a genome, it appears CheckM is not considered for eukaryotic genomes.

ADD COMMENT
0
Entering edit mode

Thank you very much for the clarification!

ADD REPLY
1
Entering edit mode
12 days ago
GenoMax 152k

RefSeq prokaryotic genomes are selected based on the criteria mentioned on this page: https://www.ncbi.nlm.nih.gov/refseq/about/prokaryotes/

For CheckM above page lists following criteria:

In order, assemblies with the highest quantized level of completeness (98 to 100) are preferred over assemblies in the 95-98, 90-95, 85-90, 70-85, 50-70, and under 50 percent level of completeness, as determined by CheckM.

RefSeq genome selection for eukaryotes is based on: https://www.ncbi.nlm.nih.gov/datasets/docs/v2/policies-annotation/genome-processing/refseq-selection/

ADD COMMENT

Login before adding your answer.

Traffic: 2440 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6