Question: comparing protein sequences to identify conserved regions
14 months ago by
Hello, I am new in field of bioinformatics. I have to do similarity search on protein sequences but the number of bacterial strain is large. There are various tools available for genome comparison but I don't know any tool that compare entire proteome of several bacterial strains. BLAST is used for similarity search tool but I don't have an idea of maximum number of bacterial strains that can be used for one Blast Search. Does anyone know of a tool that will help me to identify conserved regions in 5500 bacterial strains.


written 14 months ago by poojapawar03090
14 months ago by
BLAST will do what you want depending on your resources. You want to run an all vs all protein BLAST. Depending on the number of seqs you have, you may need to run on a computing cluster of some kind to speed it up and run across multiple cores(which I think you definitely will with so many strains). Or break the BLAST up into chunks that could be run locally and script the automation of running - would take a long time maybe. I recently ran an AllVAll blast across 900 cores for 1.2 M coding gene sequences and it took a couple of days.

written 14 months ago by moranr240
