Question: Download all RefSeq proteins from all organisms in one faa-file
1
gravatar for seth97
3.6 years ago by
seth9710
United States
seth9710 wrote:

How can I download all RefSeq proteins from all organisms in one faa-file?

I'm looking at NCBI RefSeq FTP: ftp://ftp.ncbi.nlm.nih.gov/refseq/

I can for example get the rat RefSeq: ftp://ftp.ncbi.nlm.nih.gov/refseq/R_norvegicus/mRNA_Prot/rat.1.protein.faa.gz

But how can I get all organisms in one file?

Sorry if this is an obvious question.

Thanks!

genome • 5.9k views
ADD COMMENTlink modified 19 months ago by Biostar ♦♦ 20 • written 3.6 years ago by seth9710

Thanks for that!

For combining the files I did these commands in the Terminal (mac):

"find ./ -name \*.gz -exec gunzip -k {} \;"

"cat *.faa > ~/output.txt"

 

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by seth9710

If you were not restricted to RefSeq, you could download such a single faa directly from Uniprot: http://www.uniprot.org/downloads .

ADD REPLYlink written 19 months ago by unksci150
4
gravatar for Pierre Lindenbaum
3.6 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum112k wrote:

use wget to download everything under ftp://ftp.ncbi.nlm.nih.gov/refseq/release// ( http://serverfault.com/questions/25199 )  and using option `--accept=LIST` to only keep *.faa.gz , and then concatenate the fasta files....

ADD COMMENTlink written 3.6 years ago by Pierre Lindenbaum112k
1

Wouldn't this suffice?

wget ftp://ftp.ncbi.nlm.nih.gov/refseq/release/complete/*.faa.gz

 

From: ftp://ftp.ncbi.nlm.nih.gov/refseq/release/release-notes/RefSeq-release69.txt

The data that comprises a RefSeq release are available in several
file formats, as indicated by the format component in the file name:
  bna binary ASN.1 format; includes nucleotide and protein
  gbff GenBank flat file format; nucleotide records
  gpff GenPept flat file format; protein records
  fna FASTA format; nucleotide records
  faa FASTA format; protein records

The comprehensive full release is deposited in the "complete"
directory and is available in all file types.

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by 5heikki7.6k

most probably

ADD REPLYlink written 3.6 years ago by Pierre Lindenbaum112k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 624 users visited in the last hour