Meaning behind the path organization in NCBI ftp site
1
0
Entering edit mode
6.9 years ago
Jacob ▴ 10

I am getting some genomes from the ncbi ftp site, one of the genomes(Mus musculus) is

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/635/GCF_000001635.25_GRCm38.p5/GCF_000001635.25_GRCm38.p5_genomic.fna.gz

I'm wondering what the GCF/000/001/635 in the path name means. What do GCF, 000, 001 and 635 mean and why are only certain organisms within some of the folders?

I've noticed only certain organisms have their genomes within certain folders, for example Mus spretus is in

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/624/865/GCA_001624865.1_SPRET_EiJ_v1/GCA_001624865.1_SPRET_EiJ_v1_genomic.fna.gz

(Under GCA)

And Meleagris gallapavo is in

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/605/GCF_000146605.2_Turkey_5.0/GCF_000146605.2_Turkey_5.0_genomic.fna.gz

(Still in GCF but within the folder genomes/all/GCF/000/146 instead of genomes/all/GCF/000/0001

NCBI genome ftp • 2.1k views
ADD COMMENT
0
Entering edit mode

Are you getting the paths from the assembly summary files that are in this folder. It would be best to parse the paths out of that file instead of trying to understand the directory organization.

ADD REPLY
0
Entering edit mode
6.9 years ago

This is mostly covered in the NCBI documentation, though it's not the easiest thing in the world to find. GCA and GCF are Genbank and RefSeq assemblies, respectively. Things like 000001635 are 9 digit IDs, which are generally helpful in finding things. Those are then nested in subdirectories so you don't end up with a million assemblies inside a directory (you quickly run into performance issues with absurdly high numbers of files/folders in a directory).

ADD COMMENT
0
Entering edit mode

Do you know the reasoning behind the numbering? It looks like within genomes/all/GCF/000/001/ there is only humans chimpanzees and some similar rodents (all genomes that likely have been sequenced a lot). Is there some type of pattern or technique to the path organization?

ADD REPLY
1
Entering edit mode

Quite possibly, but I've never seen that documented anywhere.

ADD REPLY

Login before adding your answer.

Traffic: 3185 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6