How To Compare Two Proteome Sets At 30% Similarity?
5
2
Entering edit mode
12.1 years ago
Sydney ▴ 20

I wish to compare the proteome of two bacterial species and identify homologs at 30% similarity. I tried to use CD-HIT-2d but the lowest similarity is 0.4, which what I want is 0.3. Anyone knows how to do it? Thanks.

proteomics • 4.7k views
ADD COMMENT
0
Entering edit mode

Unclear question. Are you saying the lowest similarity found by CD-HIT-2D was 0.4, or that the lowest available similarity threshold for CD-HIT-2D is 0.4?

ADD REPLY
0
Entering edit mode

Thanks everyone for the comments. Actually I want to identify the proteins (probably the virulence factors) which exist only in those pathogens and not in the non-pathogens. I wonder I can compare the proteomes of the pathogens and purged out the orthologs at 25% or 30%, and at the end I can get those proteins which exist in all the pathogens that I analyzed. Anyone can give me a better suggestion on how to do this? Thanks in advance.

ADD REPLY
0
Entering edit mode

have you considered MG-RAST or JGI's IMG/M? you can upload your datasets to these servers and get annotations from a variety of DB's such as COGs KEGGs etc. you can then filter out the functions of interest. you could also do an ALL-VS-ALL BLAST and then annotate only the shared proteins.

ADD REPLY
4
Entering edit mode
12.1 years ago

To start with, I would completely forget about using CD-HIT for this purpose. The point of CD-HIT is that it is a very fast algorithm for finding highly similar sequences. 30% similarity is not high similarity. Moreover bacterial genomes are not so big that speed is your primary concern here.

The exact way to do it depends a bit on whether you are looking for global or local similarity scores. Assuming that you are interested in local alignments, I would use BLAST and filter the results by whatever combination of identity, similarity, alignment length, bit score, or E-value cutoff you desire.

ADD COMMENT
4
Entering edit mode
12.1 years ago
Ahdf-Lell-Kocks ★ 1.6k

I would try jackhmmer from the HMMER 3.0 package. Something like:

~/hmmer3.0/jackhmmer proteome1.fasta proteome2.fasta
ADD COMMENT
1
Entering edit mode
12.1 years ago

the cd-hit package contains a perl script called PSI-CD-HIT for low ID cutoffs. this script uses BLAST as part of the clustering process to calculate similarities. it is not as fast as regular CD-HIT, mind. http://weizhong-lab.ucsd.edu/cd-hit/wiki/doku.php?id=cd-hit_user_guide#psi-cd-hit_clustering

however, I'm not sure this is the right tool for you. an All-VS-All BLAST of the two species can easily be done on most desktops and would probably be better.

ADD COMMENT
0
Entering edit mode
12.1 years ago
Chris ▴ 190

I would blast the first against the second proteome at some low e-value cutoff (< e-3). Then you could either take the similarity values from blast (local alignment), or run a global alignment on significant hits to get global alignment similarities.

ADD COMMENT

Login before adding your answer.

Traffic: 3231 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6