Question

How to retrieve protein sequences in Diamond?

0

Entering edit mode

8.1 years ago

frcamacho ▴ 210

My DIAMOND https://github.com/bbuchfink/diamond output looks like:

I326_1_FC30VYFAAXX:4:1:73:1672/2        BGC0000803_GG-exopolysaccharide_Saccharide_Glf_Glf_ACN94849.1   77.3    `22      5       0       2       67      82      103     4.2e-06 45.8`

However unlike BLAST there are no sequences for each hit in the produced .m8 file (tabular file). I was wondering for DIAMOND is there an option to add the subject or query sequence in tabular file. I checked the github page of DIAMOND and did not see anything.

Thanks for any suggestions.

DIAMOND • 7.0k views

ADD COMMENT • link 8.1 years ago by frcamacho ▴ 210

0

Entering edit mode

I would suggest to look at their README-file.

https://github.com/bbuchfink/diamond/blob/master/README.rst

It;s a paragraph from the file. Is it not enough for your needs?

"We assume to have a protein database file in FASTA format named nr.faa and a file of DNA reads that we want to align named reads.fna.

In order to set up a reference database for DIAMOND, the makedb command needs to be executed with the following command line:

$ diamond makedb --in nr.faa -d nr

This will create a binary DIAMOND database file with the specified name (nr.dmnd). The alignment task may then be initiated using the blastx command like this:

$ diamond blastx -d nr -q reads.fna -a matches -t <temporary directory=""> "

Good luck!

ADD REPLY • link 8.1 years ago by natasha.sernova ★ 4.0k

0

Entering edit mode

One also needs to specify SAM output when viewing the output file, e g in your example

diamond view -a matches -f sam

ADD REPLY • link 8.1 years ago by Mikael Huss 4.8k

score 0 · Answer 1 · 2016-03-30

0

Entering edit mode

8.1 years ago

Mikael Huss 4.8k

If you choose to view the output in SAM format, the sequence will be included as one of the output fields. It depends a bit on Diamond version where SAM output is specified - either as part of "diamond blastx" or "diamond view" (after the blastx step).

Regarding the protein database, it's not clear to me which one you are referring to. I typically build my own (with "diamond makedb") in which case I usually have the original file to parse out sequences and names from, if the need should arise.

ADD COMMENT • link 8.1 years ago by Mikael Huss 4.8k

0

Entering edit mode

Sequences were not included when I did diamond -view (BLAST tabular format). I will try the SAM format. Thank you!