I'm familiar with the idea of blob plots, but for filtering off contaminants in a plant I've sequenced. I've so far used a manual screening protocol, whereby I take the top hit of each query ( sim identity , but score and e value) . I then use the uniprot and trembl species identifiers to filter off plausible contaminants such as bacteria etc.
Doing this takes a hellish long time in excel, manually. Is there anyway I can filter off all hits which are not in green plants? I am using the entire uniprot and trembl as the plant I'm working on commonly has lots of bacteria and fungi on it. I have the uniprot/ trembl identifier available to me.
Any help appreciated.