Question: Refseq proteins for several taxids
seta

Hi all,

My question may sound simple. I'm trying to download the plant ref-seq proteins from NCBI to make blast database and run blastx for contigs resulted from de novo assembly of a non-model plant. As there is several taxonomy ID for plants, like flowering plants (3398), green plants (33090), ...please be ware me how I can get all plant ref-seq protein sequence to have as rich as database? please don't refer me to as it contains mixed refseq sequences, not just protein refseq. thanks in advance. 


United States
Siva

You can download only the protein sequences from the FTP URL you listed using curl.

curl -o plant.#1.protein.faa.gz\[1-87\].protein.faa.gz



Thanks a lot friend. Is there similar command to get the plant protein sequences from Uniprot?

The following query will retrieve all sequences with keyword "Complete Proteome" from the taxonomy group "Viridiplantae".[33090]%22+keyword%3A%22Complete+proteome+[KW-0181]%22
