Parsing Blast Results For Different Genus
1
0
Entering edit mode
11.3 years ago

Dear All,

I have obtained several .gb BLAST results that I want to split in different .gb or .fasta files accordingly to genus.

I searched for different ways to do it in the web, but didn't find a way to do it when one has something like 50 genus in the blast.

(to do it for a short number or genus I could use splitgb.pl from http://www.bioperl.org/wiki/HOWTO:SeqIO but it will not be feasible for a large number of genus).

Is there a script of way to do it in a efficient way ?

Best

blast parsing sequence bioperl biopython • 2.6k views
ADD COMMENT
0
Entering edit mode

It can be done easily using biopython SeqIQ.

ADD REPLY
0
Entering edit mode

I've been looking at BioPython also, but what would be the record name in SeqIO for species of genus? I could find examples for size and other characteristics of the sequence but not for taxonomy.

ADD REPLY
0
Entering edit mode

You have to learn python properly. Then you can parse any type of file. Look here: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc83

ADD REPLY
0
Entering edit mode
10.9 years ago
Joseph Hughes ★ 3.0k

It seems to me that you could just loop through everyone of your 50 genera and do a splitgb.pl for each. Something like this (this code has not been fully tested):

use Bio::SeqIO;
my $usage  = "splitgb.pl infile\n";
my $infile = shift or die $usage;
my @genera=("Homo sapiens","Sus scrofa","Mus musculus");
my $inseq = Bio::SeqIO->new(-file   => "<$infile",-format => 'Genbank');
foreach my $genus (@genera){
  my $outfile = Bio::SeqIO->new(-file   => '>$genus\.gb',-format => 'Genbank');
  while (my $seqin = $inseq->next_seq) {
     if ($seqin->species->binomial =~ m/$genus/) {
       $outfile->write_seq($seqin);
     } 
   }
}

You could optimize this so that you only need to loop through $inseq once.

ADD COMMENT

Login before adding your answer.

Traffic: 2749 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6