why there is difference in the number of refseq protein sequences in NCBI?
5.8 years ago
seta

Hi everybody,

curl -o plant.#1.protein.faa.gz ftp://ftp.ncbi.nlm.nih.gov/refseq/release/plant/plant.$1-90$.protein.faa.gz


They completely downloaded, the number of sequences in the file was 1973246 while the number of plant refseq protein sequences in the http://www.ncbi.nlm.nih.gov/protein/?term=viridiplantae[org] is 2067967. There is about 94721 difference in sequence count. Could you please let me know your opinion about is?

Thanks

