I've been running a variety of blastn options with my metagenomic dataset with the goal of trying to find the best hits for my contigs (produced from metaspades). I've recently come to a question for myself that I wanted to discuss with you all. When dealing with metagenomic data (in my case, I am working with all unmapped reads of a dataset in the hopes of figuring out what those reads might be mapping too), does it REALLY make a difference when using BLASTN -task option 'megablast' or 'blastn'.
Megablast is define as the faster blast algorithm when comparing reads within the same species. Should produce better quality hits,etc (uses a larger word-size of 28). blastn is better when you are blasting hits from different species (uses a smaller word size of 11). Megablast would be best when you want to, say for example, confirm your sequences are in fact your species of interest, or you are comparing different strains of a bacteria and need even more precise blast results within a family of bacteria.
But I guess my final question here is, when using unmapped reads/metagenomic/environmental data, it already implies that I am not trying to validate my sequences to a KNOWN species but I am trying to simply determine what these mixtures of reads could be - would it make sense to still use megablast or really there is no difference in this case between megablast and blastn?
Has anyone really found a major difference between the two when dealing with metagenomic data? Just curious. I will be running both regardless to see what I have but it is taking a while (a few days to finish) so I figured I would start this discussion in the meantime.