Question

Protein search against a database and then cluster

0

Entering edit mode

13 months ago

bobo • 0

Hi all,

Iv got a set of amino acid sequences in fasta format (8 sequences). I want to do a similarity search against a database like ncbi nr. Extract all AA sequences that match to my query sequences, to obtain a list of all the proteins present in the database. Then cluster them using a clustering tool to generate something like a minimum spanning tree to show relation between the various extracted AA sequences.

Any help in what tools to use.

also would just downloading all proteins with the same name within ncbi nr and then clustering them skipping the similarly search step also work?

Many thanks

clustering amino-acid search similarity • 482 views

ADD COMMENT • link updated 13 months ago by Mensur Dlakic ★ 27k • written 13 months ago by bobo • 0

score 0 · Answer 1 · 2023-04-07

It depends on whether your tools and databases are installed locally, or you rely on web servers.

Running an HHpred search will automatically collect the homologs, and the resulting alignment can be downloaded by clicking on Query MSA -> Download full A3M. If you run hhblits for several iterations locally, the result will be similar. MMseqs2 from the same authors will create clusters. You will have to work on your own a bit after that to get a tree.

Searching by similar names would not work if your goal is to comprehensively identify homologs. Similar proteins are not always named the same way, and some may not have any annotations.