Question: Blastn and Multiple Databases: how best to manage
1
gravatar for jeremy.cox.2
4.1 years ago by
jeremy.cox.290
United States
jeremy.cox.290 wrote:

How do I best manage multiple BLAST databases?

So I am pretty new to Bioinformatics, but I am a computer guy.  I have a few questions about how to best use blastn to achieve my goal.

I have multiple databases I prepared, creatively named virus.fa, bacteria.fa, fungi.fa, human.fa, mouse.fa, and rat.fa.
I want to be able to BLAST against any combination of the databases, hopefully without doing anything crazy like computing all database permutations.  

  1. I don't think I can BLAST against multiple databases at once, is that correct?
     
  2. As I understand it, I can make 6 separate blast databases and blast against them one at a time.  Then I concatenate the results.
    1. Is this computationally wasteful?
       
  3. I could make 1 big database and ignore hits for organisms I don't want to search.
    1. This is obviously wasteful.
    2. Can makeblastdb take multiple input files?  I don't think it can, so I would have to cat them up before making the db.
       
  4. Is there a solution I am missing, hopefully an elegant solution?

Thank you,

Jeremy Cox
CSE PhD student

blast • 1.7k views
ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by jeremy.cox.290
0
gravatar for 5heikki
4.1 years ago by
5heikki8.3k
Finland
5heikki8.3k wrote:

You should just combine the dbs (are those dbs or just fasta files?) and then make aliases for subset dbs..

 blastdb_aliastool -h

USAGE

  blastdb_aliastool [-h] [-help] [-gi_file_in input_file]

    [-gi_file_out output_file] [-db dbname] [-dbtype molecule_type]

    [-title database_title] [-gilist input_file] [-out database_name]

    [-dblist database_names] [-dblist_file file_name]

    [-num_volumes positive_integer] [-logfile File_Name] [-version]


DESCRIPTION

   Application to create BLAST database aliases, version 2.2.29+

   

   This application has three modes of operation:

   

   1) GI file conversion:

      Converts a text file containing GIs (one per line) to a more efficient

      binary format. This can be provided as an argument to the -gilist option

      of the BLAST search command line binaries or to the -gilist option of

      this program to create an alias file for a BLAST database (see below).

   

   2) Alias file creation (restricting with GI List):

      Creates an alias for a BLAST database and a GI list which restricts this

      database. This is useful if one often searches a subset of a database

      (e.g., based on organism or a curated list). The alias file makes the

      search appear as if one were searching a regular BLAST database rather

      than the subset of one.

   

   3) Alias file creation (aggregating BLAST databases):

      Creates an alias for multiple BLAST databases. All databases must be of

      the same molecule type (no validation is done). The relevant options are

      -dblist and -num_volumes.
ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by 5heikki8.3k

Based on the information you provided, it looks like I can keep the 6 databases separate and create an alias file to refer to multiple databases.

blastdb_aliastool -db microbiome -dblist "virus fungi bacteria"

I think this is the opposite of what you described in making alias for subsets?

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by jeremy.cox.290

Yes, you can do that too..

ADD REPLYlink written 4.1 years ago by 5heikki8.3k
0
gravatar for jeremy.cox.2
4.1 years ago by
jeremy.cox.290
United States
jeremy.cox.290 wrote:

I found this topic providing helpful answers

How To Blast A Sequence Against Multiple Databases

 

Sorry for duplicate question

ADD COMMENTlink written 4.1 years ago by jeremy.cox.290
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2559 users visited in the last hour