How To Extract Max Score Blast Hit Among A Huge Data Set?
Entering edit mode
8.5 years ago
HG ★ 1.2k

Hi everyone, I want to extract max score blast sequence among 50 genomic data set : my approach like this:

  1. I did all the denovo assembly of all 50 genome.

  2. Next i predicted all the ORF using Prodigal: (A Microbial Gene Prediction Software)

  3. Now i want to set up a local blast with a query gene sequence vs all 50 genomic ORF.
  4. Finally extract the best blast score sequence among all 50 genome. Can anyone suggest me how to proceed step 3 and step 4

Thank you advance.

blast • 2.2k views
Entering edit mode
8.5 years ago
Daniel ★ 3.9k

You can make a blast database of your sequences with the makeblastdb command (in linux):

makeblastdb -in genomesORFs.fas -dbtype nucl -out my_db

Then blast your sequences against it with a command like this, but choose your own parameters (there's loads on info at sites like this and this:

blastall -p blastp -i my_input_file -d my_db -o my_output_file -m 8

You can run it through a perl script to print the best match for whatever parameter you want. Something like this pulled from a different script (NOT THIS):

use strict;
use warnings;

my $BLASTn_file = $ARGV[0] ;

open BLAST, "< $BLASTn_file or die("$BLASTn_file\n$!");
while (my $line = <BLAST>){
    if ( $line !~ /^#/ ) {
            chomp $line;

            my @fields = split(/\t/, $line);
            my $query = $fields[0];
            my $match = $fields[1];
            my $evalue = $fields[10];

            if ((not exists  $Match{$query}) or ($evalue <  $Evalue{$query})){
            # Select lowest eval
                    $Match{$query} = $match;
                    $Evalue{$query} = $evalue;

   while (my ($key, $value) = each (%Match)) {
        print "$key\t$value";

Login before adding your answer.

Traffic: 971 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6