Question: How to retrieve protein sequences in Diamond?
4.7 years ago
United States
frcamacho190 wrote:

My DIAMOND output looks like:

I326_1_FC30VYFAAXX:4:1:73:1672/2        BGC0000803_GG-exopolysaccharide_Saccharide_Glf_Glf_ACN94849.1   77.3    `22      5       0       2       67      82      103     4.2e-06 45.8`

However unlike BLAST there are no sequences for each hit in the produced .m8 file (tabular file). I was wondering for DIAMOND is there an option to add the subject or query sequence in tabular file. I checked the github page of DIAMOND and did not see anything.

Thanks for any suggestions.

diamond • 3.8k views
ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by frcamacho190

I would suggest to look at their README-file.

It;s a paragraph from the file. Is it not enough for your needs?

"We assume to have a protein database file in FASTA format named nr.faa and a file of DNA reads that we want to align named reads.fna.

In order to set up a reference database for DIAMOND, the makedb command needs to be executed with the following command line:

$ diamond makedb --in nr.faa -d nr

This will create a binary DIAMOND database file with the specified name (nr.dmnd). The alignment task may then be initiated using the blastx command like this:

$ diamond blastx -d nr -q reads.fna -a matches -t <temporary directory=""> "

Good luck!

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by natasha.sernova3.8k

One also needs to specify SAM output when viewing the output file, e g in your example

diamond view -a matches -f sam

ADD REPLYlink written 4.7 years ago by Mikael Huss4.7k
4.7 years ago
Mikael Huss4.7k
Mikael Huss4.7k wrote:

If you choose to view the output in SAM format, the sequence will be included as one of the output fields. It depends a bit on Diamond version where SAM output is specified - either as part of "diamond blastx" or "diamond view" (after the blastx step).

Regarding the protein database, it's not clear to me which one you are referring to. I typically build my own (with "diamond makedb") in which case I usually have the original file to parse out sequences and names from, if the need should arise.

ADD COMMENTlink written 4.7 years ago by Mikael Huss4.7k

Sequences were not included when I did diamond -view (BLAST tabular format). I will try the SAM format. Thank you!

ADD REPLYlink written 4.7 years ago by frcamacho190

Yes, you need to specify the SAM output format. Good luck!

ADD REPLYlink written 4.7 years ago by Mikael Huss4.7k
