I was recently given 4 paired-end .fastq files (where each read has about 150 bases) extracted from dolphin stomach, each suspected to be a different strain of bacteria. I am being asked to analyze these datasets. The researchers seem to expect these 4 samples to likely be from the same genus (Helicobacter). One sample maps pretty consistently to H. cetorum MIT 00-7128 using numerous software (BLAST, sourmash, Kraken2, etc.). However, the other three samples do not map as well.
I recently wanted to use DIAMOND to further investigate these data, as was also suggested to me in a previous post. I am unable to run DIAMOND locally due to space constraints. However, I recently attempted to do so using Galaxy. I did the following (both "Diamond makedb" and "Diamond" were under the "Metagenomics analysis" tab):
1) Ran "Diamond makedb" on on the raw reads (Sample1_R1.fasta)
2) Ran "Diamond" with the following fields:
a) What do you want to align? (I selected "Align DNA query sequences (blastx)")
b) Input query file in FASTA or FASTQ format (I input Sample1_R1.fasta)
c) Will you select a reference genome from your history or use a built-in index? (I selected "Use one from the history" and input the makedb output from Step 1)
I did not change any of the defaults. Most notably, this meant I used the "Standard code". Unless I am reading the output incorrectly, I did not have any hits. There was no error in "stderr". And in "stdout", it read: "Reported 0 pairwise alignments, 0 HSSPs. 0 queries aligned."
I am a bit surprised about this, especially for the first sample which had mapped reasonably well in other software. I am not too experienced working with bacterial data and/or metagenomic data and wanted to seek advice from people with more experience: Is there an aspect of my pipeline that could be causing the zero alignment rate that you may recommend changing to analyze the data more efficiently with DIAMOND? Thank you for sharing your ideas.