I'm annotating a higher plant which is well into the higher plant lineage, it is a pond species. I've conducted a BlastX on the uniprot viridiplantae database locally and I'm getting a lot of hits for basal lineages such as the unicellular green algae. The results to me are not making much sense - given the evolutionary advancement of this plant compared to lower plants. If I look down my next best hits vice versa, there is always a more appropriate hit.
As a result of the above, I then performed a blastX with just the embryophyte lineage of uniprot and trembl (land plants and aquatic plants) and the results make much more since; however, the % identity score is low in some transcripts.What are people's takes on this? Is it better to use a more specific database in such a case, given that the evolution of my non-model organism is clear to me, and is highly evolved on a much more distant branch of the higher plants, away from the early branch of the viridiplantae? OR do I just use the entire viridiplantae lineage?
I've additionally done a BlastN to detect any contamination; with a 95% cut-off and low e-value. The database for this is made for unicellular microbial algae and eukaryotes which are appearing in the first set of blast X hits. There was very little hits for this, of which I removed.
What are people's opinions? Thanks.