Hello all, I stuck with the following problem
Having performed a de novo RNA seq alignment with Trinity and then 6 frame translation I would like to filter my Amino Acid sequence fasta file vs a known reference proteome such as Uniprot, to exact all sequences that do not occur in it.
Unfortunately I cannot filter by ID due to the de novo tags. Does anybody have a solution to filter each sequence (700,000) to see if the same sequence occurs in the reference file.?