i have BLASTXed a query file (which contain more than 140000 nucleotide sequences) with a db file (which contain more than 1400 polypeptide sequences). got the blasted query sequences around 20000 (with e-value 3) , then i change e-value to 5 still got more than 15000 aligned sequences . when i checked the identity(%) of them, there r more that 2000 sequences that their identity smaller than 30.
when i analysis the blastx result which factors should i consider?
should i select 3 or 5 for e-value?
what % of the identity could be the threshold for blastx result?
thank you in advance.