Question: Bioperl Retrieve Fasta Sequence With Colon In Header
1
gravatar for jason.r.gallant
7.7 years ago by
jason.r.gallant10 wrote:

Apologies for cross posting from StackExchange- perhaps this is the more appropriate venue. I'm trying to extract the sequence I need from a database using the following bioperl code:

use strict;
use Bio::SearchIO; 
use Bio::DB::Fasta;


my ($file, $id, $start, $end) = ("secondround_merged_expanded.fasta","C7136661:0-107",1,10);
my $db = Bio::DB::Fasta->new($file);
my $seq = $db->seq($id, $start, $end);
print $seq,"\n";

Where the header of the sequence I'm trying to extract is: C7136661:0-107, as in the file:

>C7047455:0-100
TATAATGCGAATATCGACATTCATTTGAACTGTTAAATCGGTAACATAAGCAGCACACCTGGGCAGATAGTAAAGGCATATGATAATAAGCTGGGGGCTA

The code extracts the appropriate sequence when the header and $id in above is changed to

>test
TATAATGCGAATATCGACATTCATTTGAACTGTTAAATCGGTAACATAAGCAGCACACCTGGGCAGATAGTAAAGGCATATGATAATAAGCTGGGGGCTA

I'm thinking that BioPerl doesn't like the heading with the colon. Any way to fix this so I don't have to recode the FASTA files?

perl fasta bioperl • 2.7k views
ADD COMMENTlink modified 7 months ago by Biostar ♦♦ 20 • written 7.7 years ago by jason.r.gallant10

For reference here is the SO post - http://stackoverflow.com/questions/13707302/extracting-dna-sequences-from-fasta-file-with-bioperl-with-non-standard-header - with so far, one good answer.

ADD REPLYlink written 7.7 years ago by Neilfws48k

Have you tried escaping the : with a \ prefix? Or enclosing the id in single quotations instead? I'm not really sure that will make any difference, as I assume it is the BioPerl code that is failing, but it's worth a try ;)

ADD REPLYlink modified 7.7 years ago • written 7.7 years ago by Steve Moss2.3k

Ignore this, just saw the SO post, which seems to have resolved your issue :)

ADD REPLYlink written 7.7 years ago by Steve Moss2.3k
2
gravatar for swbarnes2
7.7 years ago by
swbarnes28.2k
United States
swbarnes28.2k wrote:

You can use sed to change all the colons to underscores in the fasta. That might be the simplest way to fix the problem.

ADD COMMENTlink written 7.7 years ago by swbarnes28.2k
0
gravatar for Steve Moss
7.7 years ago by
Steve Moss2.3k
United Kingdom
Steve Moss2.3k wrote:

Does it have to be Perl? This Python code would do the business ;)

# import BioPython SeqIO module
from Bio import SeqIO

(file, id, start, end) = ("secondround_merged_expanded.fasta", "C7136661:0-107", 1, 10) 

record_dict = SeqIO.index(file, "fasta")
print record_dict[id].seq[start:end]
ADD COMMENTlink modified 7.7 years ago • written 7.7 years ago by Steve Moss2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1560 users visited in the last hour