Question

comparing protein sequences to identify conserved regions

0

Entering edit mode

7.1 years ago

ppawar • 0

Hello, I am new in field of bioinformatics. I have to do similarity search on protein sequences but the number of bacterial strain is large. There are various tools available for genome comparison but I don't know any tool that compare entire proteome of several bacterial strains. BLAST is used for similarity search tool but I don't have an idea of maximum number of bacterial strains that can be used for one Blast Search. Does anyone know of a tool that will help me to identify conserved regions in 5500 bacterial strains.

Thanks

protein sequence conserved regions • 1.9k views

ADD COMMENT • link updated 5.2 years ago by Biostar 20 • written 7.1 years ago by ppawar • 0

0

Entering edit mode

Try this site:

http://scratch.proteomics.ics.uci.edu/

As far as I have understood you have a lot of bacterial proteins,

and you know just their sequences. Do you know the bacterial

host for each of them? You need their domains/ domain prediction, right?

"DOMpro

DOMpro predicts domain locations using a 1D-RNN. DOMpro takes an input the sequence profile, predicted secondary structure, and predicted relative solvent accessiblity. The output of the 1D-RNN is a classification for each residue as being in a domain boundary region or not. The domains are then infered from this output. For a more detailed explanation, see the manuscript in references. "

http://scratch.proteomics.ics.uci.edu/explanation.html#DOMpro

try DOMpro. There is an article mentioned on that site below:

http://download.igb.uci.edu/domain.pdf

DOMpro: Protein Domain Prediction Using Profies, Secondary Structure, Relative Solvent Accessibility,and Recursive Neural Networks

ADD REPLY • link 5.2 years ago by natasha.sernova ★ 4.0k

score 0 · Answer 1 · 2017-03-29

BLAST will do what you want depending on your resources. You want to run an all vs all protein BLAST. Depending on the number of seqs you have, you may need to run on a computing cluster of some kind to speed it up and run across multiple cores(which I think you definitely will with so many strains). Or break the BLAST up into chunks that could be run locally and script the automation of running - would take a long time maybe. I recently ran an AllVAll blast across 900 cores for 1.2 M coding gene sequences and it took a couple of days.