Question

Finding the Homologous sequences

0

Entering edit mode

6 months ago

siu ▴ 160

Hi,

I want to identify the homologs of my seed sequences in large number of proteomes. I know that I can use Blastp, HMMER and some deep learning homology detection tools. Also I could also try to find the domains and GO for filtering the false positives. Can you please suggest some advanced bioinformatics methods/tools to validate the homologs identified from these methods [high confidence homologs]?

Please Help

Thanks in advance

HMMER Homology • 903 views

ADD COMMENT • link 6 months ago by siu ▴ 160

score 1 · Answer 1 · 2023-10-10

Not sure if you are just writing imprecisely, or you don't have a clear grasp of what you call "high confidence homologs." Detecting them is not a matter of advanced tools, but rather of confidence (statistical significance) you get from any given tool. If by BLAST searching you get that two proteins share 98% identity over their whole length and with an E-value of 1e-289, you don't need any other tool to conclude that they are most likely the same thing. If they share only 50% identity and one of them is 250 residues long while the other is 200 long, they would be homologous but not necessarily high confidence homologs with 100% overlapping functionality. In such cases the only way to be somewhat sure they are the same thing is to build 3D models, say using AlphaFold2. A similarity at structural level would be more indicative of high homology, but there is no guarantee that's the case without doing actual experiments in the lab.

score 1 · Answer 2 · 2023-10-10

1

Entering edit mode

6 months ago

yl759 ▴ 120

Depending on your definition of homology, do you only consider sequence-level homology, or are you also interested in structural-level homology? Proteins with similar structures can have low sequence similarity, landing in the so-called "twilight zone" of 25-30% sequence identity. My following response will be based on the assumption that you are seeking both sequential and structural homologs.

Having either high sequence identity or high structural similarity may meet your requirement for "high confidence homologs." On the sequence similarity side, I would suggest experimenting with various thresholds for HMM/BLASTP and combining those criteria. From the structural perspective, I would recommend trying out FoldSeek or TM-vec/DeepBLAST. Assuming you have substantial computing resources and a list of candidates to test, you can use AlphaFold2 to obtain structures and then test their structural alignment with tools like TM-align.

ADD COMMENT • link 6 months ago by yl759 ▴ 120

0

Entering edit mode

Thank you so much for the suggestion. Highly appreciated!

ADD REPLY • link 6 months ago by siu ▴ 160

0

Entering edit mode

Hi, I want to ask one more thing that if I have used multiple tools for identification of homologs of my sequences of interest and then generate a ranking/scoring type system based on the output from different tools to gain confidence in homology identification. Is it a feasible approach and if so how will I be able to do it (based on e-values from various tools?)

Thanks

ADD REPLY • link 6 months ago by siu ▴ 160

0

Entering edit mode

Hey there, I'm not quite sure about the specific tools you're using, but typically, E-values from different tools aren’t directly comparable because of the different algorithms and database sizes they use. However, you can still pick out the high-confidence results from each tool based on their E-values, and merge those lists together. The tricky part might come when you try to rank or score the combined list, as you mentioned. I hope this helps clear things up a bit!