Question: Refseq proteins for several taxids
0
gravatar for seta
4.3 years ago by
seta1.1k
Sweden
seta1.1k wrote:

Hi all,

My question may sound simple. I'm trying to download the plant ref-seq proteins from NCBI to make blast database and run blastx for contigs resulted from de novo assembly of a non-model plant. As there is several taxonomy ID for plants, like flowering plants (3398), green plants (33090), ...please be ware me how I can get all plant ref-seq protein sequence to have as rich as database? please don't refer me to ftp://ftp.ncbi.nlm.nih.gov/refseq/release/plant/ as it contains mixed refseq sequences, not just protein refseq. thanks in advance. 

 

rna-seq blast next-gen • 1.8k views
ADD COMMENTlink modified 4.3 years ago by Siva1.6k • written 4.3 years ago by seta1.1k
2
gravatar for Siva
4.3 years ago by
Siva1.6k
United States
Siva1.6k wrote:

You can download only the protein sequences from the FTP URL you listed using curl.

curl -o plant.#1.protein.faa.gz ftp://ftp.ncbi.nlm.nih.gov/refseq/release/plant/plant.\[1-87\].protein.faa.gz

 

 

ADD COMMENTlink written 4.3 years ago by Siva1.6k

Thanks a lot friend. Is there similar command to get the plant protein sequences from Uniprot?

ADD REPLYlink written 4.3 years ago by seta1.1k

The following query will retrieve all sequences with keyword "Complete Proteome" from the taxonomy group "Viridiplantae".

http://www.uniprot.org/uniprot/?query=taxonomy%3A%22Viridiplantae+[33090]%22+keyword%3A%22Complete+proteome+[KW-0181]%22
ADD REPLYlink written 4.3 years ago by Siva1.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 919 users visited in the last hour