How to retrieve protein sequences in Diamond?
1
0
Entering edit mode
8.1 years ago
frcamacho ▴ 210

My DIAMOND https://github.com/bbuchfink/diamond output looks like:

I326_1_FC30VYFAAXX:4:1:73:1672/2        BGC0000803_GG-exopolysaccharide_Saccharide_Glf_Glf_ACN94849.1   77.3    `22      5       0       2       67      82      103     4.2e-06 45.8`

However unlike BLAST there are no sequences for each hit in the produced .m8 file (tabular file). I was wondering for DIAMOND is there an option to add the subject or query sequence in tabular file. I checked the github page of DIAMOND and did not see anything.

Thanks for any suggestions.

DIAMOND • 7.0k views
ADD COMMENT
0
Entering edit mode

I would suggest to look at their README-file.

https://github.com/bbuchfink/diamond/blob/master/README.rst

It;s a paragraph from the file. Is it not enough for your needs?

"We assume to have a protein database file in FASTA format named nr.faa and a file of DNA reads that we want to align named reads.fna.

In order to set up a reference database for DIAMOND, the makedb command needs to be executed with the following command line:

$ diamond makedb --in nr.faa -d nr

This will create a binary DIAMOND database file with the specified name (nr.dmnd). The alignment task may then be initiated using the blastx command like this:

$ diamond blastx -d nr -q reads.fna -a matches -t <temporary directory=""> "

Good luck!

ADD REPLY
0
Entering edit mode

One also needs to specify SAM output when viewing the output file, e g in your example

diamond view -a matches -f sam

ADD REPLY
0
Entering edit mode
8.1 years ago

If you choose to view the output in SAM format, the sequence will be included as one of the output fields. It depends a bit on Diamond version where SAM output is specified - either as part of "diamond blastx" or "diamond view" (after the blastx step).

Regarding the protein database, it's not clear to me which one you are referring to. I typically build my own (with "diamond makedb") in which case I usually have the original file to parse out sequences and names from, if the need should arise.

ADD COMMENT
0
Entering edit mode

Sequences were not included when I did diamond -view (BLAST tabular format). I will try the SAM format. Thank you!

ADD REPLY
0
Entering edit mode

Yes, you need to specify the SAM output format. Good luck!

ADD REPLY

Login before adding your answer.

Traffic: 2529 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6