Entering edit mode
8.8 years ago
seta
★
1.9k
Hi everybody,
I downloaded refseq protein sequences (just plants) using the command:
curl -o plant.#1.protein.faa.gz ftp://ftp.ncbi.nlm.nih.gov/refseq/release/plant/plant.\[1-90\].protein.faa.gz
They completely downloaded, the number of sequences in the file was 1973246 while the number of plant refseq protein sequences in the http://www.ncbi.nlm.nih.gov/protein/?term=viridiplantae[org] is 2067967. There is about 94721 difference in sequence count. Could you please let me know your opinion about is?
Thanks