Sequences Collection Based On Length
2
3
Entering edit mode
13.4 years ago
Mat ▴ 30

Hello, I made my own database of protein sequences with formatdb. I would like to know if it is possible and how to retrieve all the sequences of a certain length (e.g all the sequences smaller than 200 aa).

Thanks

Matteo

fasta sequence retrieval database • 1.9k views
ADD COMMENT
5
Entering edit mode
13.4 years ago
Neilfws 49k

What Pierre said. Assuming you have dumped to fasta file myseqs.fa, here's a Bioperl approach:

#!/usr/bin/perl -w

use strict;
use Bio::SeqIO;

my $inseq  = Bio::SeqIO->new(-file => "myseqs.fa", -format => "fasta");
my $outseq = Bio::SeqIO->new(-file => ">myseqs200.fa", -format => "fasta");

# write to file myseqs200.fa
while(my $seq = $inseq->next_seq) {
  if($seq->length <= 200) {
    $outseq->write_seq($seq);
  }
}
ADD COMMENT
4
Entering edit mode
13.4 years ago

Use fastacmd to dump your database as fasta (option -D 1 ) and filter the result using your favorite tool (perl, awk, etc... )

ADD COMMENT

Login before adding your answer.

Traffic: 1968 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6