Question: comparing protein sequences to identify conserved regions
gravatar for ppawar
3.5 years ago by
ppawar0 wrote:

Hello, I am new in field of bioinformatics. I have to do similarity search on protein sequences but the number of bacterial strain is large. There are various tools available for genome comparison but I don't know any tool that compare entire proteome of several bacterial strains. BLAST is used for similarity search tool but I don't have an idea of maximum number of bacterial strains that can be used for one Blast Search. Does anyone know of a tool that will help me to identify conserved regions in 5500 bacterial strains.


ADD COMMENTlink modified 19 months ago by Biostar ♦♦ 20 • written 3.5 years ago by ppawar0

Try this site:

As far as I have understood you have a lot of bacterial proteins,

and you know just their sequences. Do you know the bacterial

host for each of them? You need their domains/ domain prediction, right?


DOMpro predicts domain locations using a 1D-RNN. DOMpro takes an input the sequence profile, predicted secondary structure, and predicted relative solvent accessiblity. The output of the 1D-RNN is a classification for each residue as being in a domain boundary region or not. The domains are then infered from this output. For a more detailed explanation, see the manuscript in references. "

try DOMpro. There is an article mentioned on that site below:

DOMpro: Protein Domain Prediction Using Profies, Secondary Structure, Relative Solvent Accessibility,and Recursive Neural Networks

ADD REPLYlink modified 19 months ago • written 19 months ago by natasha.sernova3.7k
gravatar for moranr
3.5 years ago by
moranr270 wrote:

BLAST will do what you want depending on your resources. You want to run an all vs all protein BLAST. Depending on the number of seqs you have, you may need to run on a computing cluster of some kind to speed it up and run across multiple cores(which I think you definitely will with so many strains). Or break the BLAST up into chunks that could be run locally and script the automation of running - would take a long time maybe. I recently ran an AllVAll blast across 900 cores for 1.2 M coding gene sequences and it took a couple of days.

ADD COMMENTlink written 3.5 years ago by moranr270
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1226 users visited in the last hour