I was doing command line blast for around 40000 sequences. I downloaded the protein databases for Arabidopsis thaliana, viridiplantae and swissprot. Then I did the blastp against all these databases. I found that 46 sequences have no blast hit against Arabidopsis database 96 sequence for viridiplantae and 159 sequences for swissprot database.
I wonder how the no of sequences has increased in viridiplantae from 46 to 96 while it contains all the proteins of plants including the Arabidopsis proteins and similarly these numbers increased in swissprot from 96 to 159 while swissprot contains all the proteins including the viridiplantae.
Now the question is how it is possible that a sequence have blast hit in the Arabidopsis database and the same sequence have no blast hit in viridiplantae and swissprot database.
Is there something wrong with the blast?