Is there a fast way to do this filter?
I have a huge Fasta file (sequences are short reads coming from an Illumina instrument). I have also a list of nucleotide sequences (not Fasta, just the sequences) and I want to remove from the big Fasta file all entries identical to those in the list.
My idea was simply to go down through the Fasta file and then, for every read, check all the sequences of the list. If the read matches one of the sequences then do nothing, otherwise print the read into a new file. I made this with perl but it takes ages!
The list is made up of nucleotide sequences, not IDs. It's something like this: