Best way to BLAST orthogroups?
1
0
Entering edit mode
3 months ago
MJ Mantas • 0

Hi!

I have thousands of orthogroup txt files. Each file contains a number of protein sequences from multiple bacterial species. I want to find the best protein match to each orthogroup.

What is the best way to do this?

I thought about generating a profile hidden Markov model (pHMM) for each orthogroup, and BLAST the pHMM. Would this is a reasonable approach? Is there a better way to do this?

Another problem I can think of is that, if I download the prokaryotic genome database from NCBI and BLAST my sequence(s) against it, the best match would likely correspond to the bacterial species/protein sequence being blasted.

How can I find the closest match that isn't that exact sequence?

Thanks in advance for any help!

command-line blast orthogroups pHMM • 285 views
1
Entering edit mode
3 months ago
Mensur Dlakic ★ 18k

I am assuming you don't literally mean BLAST the pHMM because BLAST doesn't work with pHMMs. But yes, searching a larger database with pHMMs is likely to give you the best match. That may or may not be one of the proteins you used to build the pHMM - it is impossible to know ahead of time.

Alternatively, pHMMs can emit consensus sequences, which will be the best match that a model can score. That most likely won't be identical to any real protein, although it might for groups that include very related sequences without many indels. By the way, that consensus sequences can be BLASTed against the database, and that may be another way to find a match that interests you.

0
Entering edit mode

thank you for the feedback! can you please guide me on how to obtain a phmm consensus sequence?

1
Entering edit mode

hmmemit from the HMMer package will output a consensus sequence from a given pHMM.