Question: Parsing Blast Results For Different Genus
0
gravatar for afonsomduarte
6.1 years ago by
afonsomduarte20 wrote:

Dear All,

I have obtained several .gb BLAST results that I want to split in different .gb or .fasta files accordingly to genus.

I searched for different ways to do it in the web, but didn't find a way to do it when one has something like 50 genus in the blast.

(to do it for a short number or genus I could use splitgb.pl from http://www.bioperl.org/wiki/HOWTO:SeqIO but it will not be feasible for a large number of genus).

Is there a script of way to do it in a efficient way ?

Best

ADD COMMENTlink modified 5.7 years ago by Joseph Hughes2.7k • written 6.1 years ago by afonsomduarte20

It can be done easily using biopython SeqIQ.

ADD REPLYlink written 6.1 years ago by Pappu1.9k

I've been looking at BioPython also, but what would be the record name in SeqIO for species of genus? I could find examples for size and other characteristics of the sequence but not for taxonomy.

ADD REPLYlink written 6.1 years ago by afonsomduarte20

You have to learn python properly. Then you can parse any type of file. Look here: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc83

ADD REPLYlink written 6.1 years ago by Pappu1.9k
0
gravatar for Joseph Hughes
5.7 years ago by
Joseph Hughes2.7k
Scotland, UK
Joseph Hughes2.7k wrote:

It seems to me that you could just loop through everyone of your 50 genera and do a splitgb.pl for each. Something like this (this code has not been fully tested):

use Bio::SeqIO;
my $usage  = "splitgb.pl infile\n";
my $infile = shift or die $usage;
my @genera=("Homo sapiens","Sus scrofa","Mus musculus");
my $inseq = Bio::SeqIO->new(-file   => "<$infile",-format => 'Genbank');
foreach my $genus (@genera){
  my $outfile = Bio::SeqIO->new(-file   => '>$genus\.gb',-format => 'Genbank');
  while (my $seqin = $inseq->next_seq) {
     if ($seqin->species->binomial =~ m/$genus/) {
       $outfile->write_seq($seqin);
     } 
   }
}

You could optimize this so that you only need to loop through $inseq once.

ADD COMMENTlink written 5.7 years ago by Joseph Hughes2.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1566 users visited in the last hour