Question: how to BLAST search for known genes in local set of genomes?
gravatar for jerrybug109
23 months ago by
United States
jerrybug10910 wrote:

Hi all,

Our lab has sequenced a set of different Bacillus strains and assembled contigs for each individual genome. I wish to set up a search for the presence/absence of multiple known genes (we have FASTA files for those) in our set of genomes. I was hoping to do this via BLAST but looking at the website, it seems that you can only search for genes in genomes exclusively available on the NCBI database.

Is there any way to set up a search for genes in the genomes of the strains that I've sequenced and assembled? I was hoping to find an option to upload our own "search sets" but it doesn't seem to be available on


ADD COMMENTlink modified 23 months ago by natasha.sernova2.8k • written 23 months ago by jerrybug10910
gravatar for mastal511
23 months ago by
mastal5111.8k wrote:

You can do standalone blast, and make your genomes into a database, but that requires using the command-line.

ADD COMMENTlink written 23 months ago by mastal5111.8k

Thanks, I appreciate your response. I might give this try then - I have rudimentary UNIX experience. By any chance, do you know of anything like Galaxy that offers service like BLAST but lets you upload your own database?

ADD REPLYlink written 23 months ago by jerrybug10910

If you have UNIX skills to get started then by all means find a local computer resource (or even a desktop with respectable spces). This would be a good chance to get your feet wet and polish your UNIX skills. You can use Jim Kent's blat (in addition to/instead of blast) which can be very fast for identifying closely related sequences. Since you are working with bacteria you may not need very beefy hardware (8 G RAM may be min req).

You could also use blast 2 sequences against each other service from NCBI to search in a pairwise fashion.

ADD REPLYlink modified 23 months ago • written 23 months ago by genomax43k
gravatar for natasha.sernova
23 months ago by
natasha.sernova2.8k wrote:

You have a set of bacteria, so you don’t need to worry about introns.

Make a database and search inside the database with blastn, for example.

1) First you need to make a database of your nucleotide sequences.

To do this:

makeblastdb -in input_file (file-name of the contigs or whatever) -dbtype nucl (if nucleotide) -out dbname (the database name)

Use input file *.fa

2) Run the blast-program:

tblastx -query input (with the gene file) -db (database name, which was created in step 1) -out outname (file name with the results)

If you don’t like to work with proteins, use blastn for this search in the nucleotide database. You said you know the genes in the genomes?

If you know where their genes are you can translate them into proteins.

I would use tblastx for your task.

If you would like to search the database using a protein query,

use tblastn, but in practice tblastx usually finds more sequences...

ADD COMMENTlink modified 23 months ago • written 23 months ago by natasha.sernova2.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1524 users visited in the last hour