Hi, we have sequenced a new genome, now i want to annotate the genome using know proteins and ESTs from other species. Proteins or ESTs were blast against to our genome, but some of them got too many hits. My question is that - should i filter these blast hits by e-values? If so, how to reasonably setting these parameters?
You may start with filtering out plant protein entries containing repetitive sequences. Check Pfam database for "plant transposon" i.e.: http://pfam.janelia.org/search/keyword?query=plant+transposon
GMAP may be better than blast for ESTs mapping (it is splice site aware).
For same species ESTs used for gene mapping try to use PASA.
As mentioned above, you are probably better using a program that specifically accounts for the presence of introns. Exonerate is a good one and freely available.
When using ESTs I would recommend getting rid of anything with less than 90% identity.
Also, divide the length of your alignment by the length of the original sequence to get a % aligned value. Then remove anything with less than e.g. 90% of its length aligned.
Adjust the 90% if you seem to get too few hits.