Regarding sequence dereplication with vsearch, I have seen the following statement:
"During dereplication, strictly identical sequences are grouped and receive the name of the first sequence of the group."
Now, I'm not exactly an expert on hash tables, so how do I know which exactly is the first sequence of the group--is it the one which occurs first in the input fasta file? If so, that would make things easy for me, because some of the sequences have important designations in their headers, which need to not get lost, so they will show up in BLAST results. Or is it more complicated? I ask because I am creating a custom database, composed of fasta files originating from different sources.