Finding Protein Homology?
6
10
Entering edit mode
11.1 years ago
ilovepython ▴ 150

I'm looking for something that's more rigorous than BLAST and less rigorous than a global alignment-esque algorithm. Is there a paper out there that compare methods? What's the most popular method?

I'm doing a BLAST search against one protein, and then trying to find closer hits with a better search method. Is there a better way of going about this?

Edit: This question is probably too broad so here are some additional details. I'm interested in transmembrane proteins, so some type of analysis on transmembrane segments of the proteins will be important.

homology protein blast homology • 4.5k views
0
Entering edit mode

By "more rigorous than BLAST and less rigorous than a global alignment-esque algorithm" do you mean less rigorous than a dynamic programing algorithm (Smith-Waterman and Needleman-Wunsch)? Neither of these are rigorous enough to detect remote homology because they are pairwise alignments. When you do a search, these algorithms have absolutely no way of knowing which sequence positions are more important than others. Please see my MSA-based response bellow for an alternative to pairwise alignments.

0
Entering edit mode

I see that by "rigorous" you mean fewer false positives. Even a "global alignment-esque algorithm" would not be rigorous enough (and sensitive enough!) to detect remote homology well because those are all pairwise alignments. When you do a search, these algorithms have absolutely no way of knowing which sequence positions are more important than others. Please see my MSA-based response bellow for an alternative to pairwise alignments.

0
Entering edit mode

Btw, you cannot 'find homology', you can hypothesize homology of sequences based on sequence similarity which you detected using sequence similarity search.

9
Entering edit mode
11.1 years ago
Spitshine ▴ 640

For practical purposes, PSI-BLAST or HMMer searches are the tools of choice for finding (remote) homologs. If you know the domain, HMMer will do the trick. Most likely, the transmembrane elements are included in the HMM from SMART or PFAM.

There are many comparisons but you will need to define your task more precisely. Do you want to find homologs for an orphan protein or detect all members of a protein in a genome? Does more rigorous mean fewer false-positives? And you're not running BLAST against a database of one protein, are you?

1
Entering edit mode

The recent implementation of HMMer (3.0) is fast and reliable but if you have transmembrane regions, it pays to do a reverse BLAST/PSI-BLAST with candidate hits to confirm and weed out hits to regions with composition biases.

0
Entering edit mode

I'm looking for homologs for a group of proteins. I just re-read the HMMer doc, and it seems like it's exactly what I'm looking for! I'm not sure how I missed this profiling feature before. Rigorous does mean fewer false positives. I think my language was unclear above, I'm running blast using ncbi's non-redundant set. Thanks so much!

6
Entering edit mode
11.1 years ago

Handpick several proteins (functionally, structurally, or evolutionary related) and build a multiple sequence alignment (MSA).

For aligning multiple transmembrane proteins you may want to consider this paper/tool: PRALINE^TM

Once you have a good MSA, use HMMER hmmbuild to convert the MSA to a profile HMM.

Then search a large database (e.g. UnitProt which is Swiss-Prot and TrEMBL) using the profile HMM with HMMER hmmsearch.

This method should be good at finding remote homologs.

3
Entering edit mode
11.1 years ago
Pals ★ 1.3k

I am quite a new fellow here. As this question is related to what I do, I try to write what I know. I am convinced with bilouweb. If you have got the template structure and has very low sequence identity, you could make secondary structure prediction of the template using for example Jpred. It gives a number of aligned sequences that have similar secondary structure. On the other hand, you can do simple protein blast against nr database for your model sequence and then align those sequences. At last you can combine the two MSAs for generating pairwise alignment.

The other strategy, if you do not have template structure would be structure prediction tools of course. I had once used I-TASSER and it worked quite nicely. You will get much more information about your protein than just the structure.

The last option would be the prediction tools related to membrane protein for example split server.

2
Entering edit mode
11.1 years ago
Bilouweb ★ 1.1k

Some tools try to detect homology from the structure (secondary or tertiary) of proteins but there is not much transmembrane protein structures available.

I think the web server Phyre is a good tool to begin with. From a amino acid sequence, it gives: - secondary structure predictions from 3 predictors - disorder region predictions - homologous proteins found

Protein fold recognition programs can help you because some are based on homolog research. You will find a good list of them on the CASP experiment web page

0
Entering edit mode

99% of the sequences I'm working with don't have structure data. I didn't consider using a structure based prediction before, I will definitely try this out. Thanks!

2
Entering edit mode
11.1 years ago

If you are interested in detection of transmembrane proteins the tool of choice might be TMHMM Here's a web server for TMHMM. Also the SignalP program might help.

Here is a review about the application of these tools to cellular location you might find useful.

1
Entering edit mode
11.1 years ago

I like (and have voted up) the responses above, particularly the motif and MSA answers. If you want real quality to your MSA, make certain to keep your domains, even your transmembrane segments intact. In other words, an MSA that retains or is aligned to secondary and tertiary structural elements will be of higher quality and allow you to move forward with greater confidence.

0
Entering edit mode

Which one was the motif answer?

0
Entering edit mode

Which one is the motif answer?

0
Entering edit mode

HMMer and such tools to detect functional domains/motifs. Sorry, I mostly deal with DNA sequence (motif) and think less of protein profiles...

Traffic: 2799 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.