Understanding the refseq ftp file
0
0
Entering edit mode
9.1 years ago
jeremy.cox.2 ▴ 130

I set out to download and compile the complete refseq bacteria database.

I download from ftp://ftp.ncbi.nlm.nih.gov/refseq/release/bacteria/

*.genomic.fna.gz files

After decompresion, the files total ~100 GB. Whereas my nt bacterial database is only 12 GB. And I expect refseq to be smaller than nt. So I think I have misunderstood what I want to download.

Can you help me figure out what files or how to know what files I actually want? I am doing the same for viruses and fungi.

refseq ncbi • 3.4k views
ADD COMMENT
0
Entering edit mode

This seems strange. Are you sure the bacterial nt database was downloaded accurately?

ADD REPLY
1
Entering edit mode

nt is non-redundant, refseq genomic is not. Just check how huge the refseq_genomic blast db is in comparison to nt (26 tar.gz vs 152 tar.gz files)..

ADD REPLY
0
Entering edit mode

Yes, I downloaded the NT database correctly. I downloaded a single "nt.fa" file, compressed.

It would seem that the total size of the refseq_genomic blast db ~26 GB, so clearly I have downloaded the WRONG FILES.

This question is how do I know which files are the correct files?

ADD REPLY
0
Entering edit mode

Are you sure it's just 26 GB? In the ftp, there are 152 ~0.9GB refseq_genomic tar.gz files. Surely uncompromising them makes it even bigger, i.e. over ~136 GB.

ADD REPLY

Login before adding your answer.

Traffic: 2579 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6