what is the difference between /refseq/release and /genomes/refseq
0
1
Entering edit mode
4.7 years ago
tastafor ▴ 10

I am looking for the refseq protein sequences (protein.faa.gz files) of all animals. I don't understand the difference between the two -

ftp://ftp.ncbi.nih.gov/refseq/release/

ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/

The refseq/release has files labelled as vertebrate_mammalian_1.protein.faa.gz etc, while genomes/refseq has separate files for each species with .protein.faa.gz file for each species.

What is the difference? Which one is better if I need all the animal refseq protein sequences?

NCBI refseq genome ftp fasta • 2.4k views
ADD COMMENT
1
Entering edit mode

RefSeq sequences are for organisms that may or may not have completed genome. Refseq under the genomes section is for complete genomes.

Refseq:

The NCBI RefSeq project is an ongoing effort to provide a curated, non-redundant collection of reference sequences, representative of the central dogma, for each major organism. The full release incorporates genomic, transcript, and protein data available at the time of each release.

Genomes:

Sequence data is provided for all single organism genome assemblies that are included in NCBI's Assembly resource www.ncbi.nlm.nih.gov/assembly/)

refseq: content includes assembled genome sequence and RefSeq annotation data. All prokaryotic and eukaryotic RefSeq genomes have annotation. RefSeq annotation data may be calculated by NCBI annotation
pipelines or propagated from the GenBank submission. The RefSeq directory area includes fewer organisms than the GenBank directory area because not all genome assemblies are selected for the RefSeq project.

ADD REPLY
0
Entering edit mode

So which one is more comprehensive, the refseq sequences or refseq under genome sequences? Thank you

ADD REPLY
1
Entering edit mode

RefSeq under the genomes is a snapshot of all of the proteins that were included in the annotation. If an organism is being actively curated and a bunch of new RefSeqs get added, they will not be included in the annotation files in the FTP genomes path until a new annotation release is made. RefSeq releases occur independently of the annotation releases so all new RefSeqs get included in the FTP refseq releases path.

ADD REPLY

Login before adding your answer.

Traffic: 2657 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6