Faking hh-suite workflow / alignment output
1
0
Entering edit mode
7 weeks ago
Nick ▴ 40

Given the following:

• query.fasta -> single entry
• reference.fasta -> multiple entries

I now want to 'fake' (or just be able to get) an output that looks like a proper *.hhr alignment file, i.e. as if I aligned the query.fasta against the profiles of the sequences in the reference.fasta. I think this can be achieved with

hhsearch -i query.fasta -d reference.ff{data,index}

However, i really don't want the extra steps from aligning the reference sequences to any DB and all that HMM building. I am just interested in aligning the query against the reference sequences and then getting the same format in .hhr file (ordering, alignments, statistics).

I cannot figure out how to do that. Everything in hhsuite feels very long-winded.

For a single sequence in the reference.fasta hhalign does the job. To clarify, the sequences in reference.fasta might have some similarities but they also might be very different. Doing an MSA first on these doesnt make much sense. I am probably missing something super obvious but I cannot figure out how to get to this .hhr file of indepedant sequence sin the reference.fasta.-

Thanks!

alignment hhsuite • 394 views
0
Entering edit mode
7 weeks ago
Mensur Dlakic ★ 23k

Everything in hhsuite feels very long-winded.

Complex programs with many options are either properly explained for the non-TLDR crowd, which is long-winded, or users are left to figure things out on their own. When one wants to do something that is outside of the normal program scope, reading the manual and testing many options may be the only way to get it done.

I cannot figure out how to do that.

You should consider that you are not missing anything, and that what you want to do can't be done with HHsuite. It is not meant for that purpose. If hhalign output is not what you want, chances are that it can't be done without doing the actual search. The purpose of HHsuite is not to align sequences that might have some similarities but they also might be very different, but rather to identify and align sequences that are provenly related to the query.

0
Entering edit mode

I totally get your point, maybe long-winded was also not the right description for my issue. One example, custom databases are very clunky to create (naming *wo_ss and then the ss prediction is not recommended anyway according to the guide + the resorting and renaming. That's almost impossible to think of from the CLI help. Let's also not start with the output format of most applications. Anyway, I think you might have misunderstood my information about the sequences in the reference.fasta. They were chosen because they do have some similarity with the query but not necessarily amongst each other to a higher degree. hhalign works for a single entry in the reference but would not give me out of the box a format that I can use to determine which of the sequences would have the highest similarity. So my initial idea was to essentially build that HMM profiles with single sequences and then do hhblits against that fake database (profile depth for each sequence in the reference.fasta = 1). The second option I see is to use hhalign and clumsily assemble the individual result files from 1vs1 alignments into an hhr file but that would lead to wrong e-value scorings. Thirdly, maybe phmmer / jachhmmer are better suited for that task but I'll need to check if they provide hhr output / information. So what is important to me to get the top x matches in the reference.fasta and in addition the information from hhsearch about the alignments exactly like it is provided in the hhr format.

0
Entering edit mode

Pretty sure that nothing other than HHsuite outputs results in .hhr format.