How to retrieve protein sequences in Diamond?
5.8 years ago
frcamacho ▴ 190

My DIAMOND https://github.com/bbuchfink/diamond output looks like:

I326_1_FC30VYFAAXX:4:1:73:1672/2        BGC0000803_GG-exopolysaccharide_Saccharide_Glf_Glf_ACN94849.1   77.3    22      5       0       2       67      82      103     4.2e-06 45.8


However unlike BLAST there are no sequences for each hit in the produced .m8 file (tabular file). I was wondering for DIAMOND is there an option to add the subject or query sequence in tabular file. I checked the github page of DIAMOND and did not see anything.

Thanks for any suggestions.

I would suggest to look at their README-file.

It;s a paragraph from the file. Is it not enough for your needs?

"We assume to have a protein database file in FASTA format named nr.faa and a file of DNA reads that we want to align named reads.fna.

In order to set up a reference database for DIAMOND, the makedb command needs to be executed with the following command line:

diamond makedb --in nr.faa -d nr This will create a binary DIAMOND database file with the specified name (nr.dmnd). The alignment task may then be initiated using the blastx command like this: diamond blastx -d nr -q reads.fna -a matches -t <temporary directory=""> "

Good luck!

One also needs to specify SAM output when viewing the output file, e g in your example

diamond view -a matches -f sam

5.8 years ago

If you choose to view the output in SAM format, the sequence will be included as one of the output fields. It depends a bit on Diamond version where SAM output is specified - either as part of "diamond blastx" or "diamond view" (after the blastx step).

Regarding the protein database, it's not clear to me which one you are referring to. I typically build my own (with "diamond makedb") in which case I usually have the original file to parse out sequences and names from, if the need should arise.

Sequences were not included when I did diamond -view (BLAST tabular format). I will try the SAM format. Thank you!

Yes, you need to specify the SAM output format. Good luck!