Question

Bioperl Retrieve Fasta Sequence With Colon In Header

1

Entering edit mode

11.4 years ago

jason.r.gallant ▴ 10

Apologies for cross posting from StackExchange- perhaps this is the more appropriate venue. I'm trying to extract the sequence I need from a database using the following bioperl code:

use strict;
use Bio::SearchIO; 
use Bio::DB::Fasta;


my ($file, $id, $start, $end) = ("secondround_merged_expanded.fasta","C7136661:0-107",1,10);
my $db = Bio::DB::Fasta->new($file);
my $seq = $db->seq($id, $start, $end);
print $seq,"\n";

Where the header of the sequence I'm trying to extract is: C7136661:0-107, as in the file:

>C7047455:0-100
TATAATGCGAATATCGACATTCATTTGAACTGTTAAATCGGTAACATAAGCAGCACACCTGGGCAGATAGTAAAGGCATATGATAATAAGCTGGGGGCTA

The code extracts the appropriate sequence when the header and $id in above is changed to

>test
TATAATGCGAATATCGACATTCATTTGAACTGTTAAATCGGTAACATAAGCAGCACACCTGGGCAGATAGTAAAGGCATATGATAATAAGCTGGGGGCTA

I'm thinking that BioPerl doesn't like the heading with the colon. Any way to fix this so I don't have to recode the FASTA files?

bioperl perl fasta • 3.8k views

ADD COMMENT • link updated 4.3 years ago by Biostar 20 • written 11.4 years ago by jason.r.gallant ▴ 10

0

Entering edit mode

For reference here is the SO post - http://stackoverflow.com/questions/13707302/extracting-dna-sequences-from-fasta-file-with-bioperl-with-non-standard-header - with so far, one good answer.

ADD REPLY • link 11.4 years ago by Neilfws 49k

0

Entering edit mode

Have you tried escaping the : with a \ prefix? Or enclosing the id in single quotations instead? I'm not really sure that will make any difference, as I assume it is the BioPerl code that is failing, but it's worth a try ;)

ADD REPLY • link 11.4 years ago by Steve Moss 2.3k

0

Entering edit mode

Ignore this, just saw the SO post, which seems to have resolved your issue :)

ADD REPLY • link 11.4 years ago by Steve Moss 2.3k

score 2 · Answer 1 · 2012-12-05

2

Entering edit mode

11.4 years ago

swbarnes2 14k

You can use sed to change all the colons to underscores in the fasta. That might be the simplest way to fix the problem.

ADD COMMENT • link 11.4 years ago by swbarnes2 14k

score 0 · Answer 2 · 2012-12-05

0

Entering edit mode

11.4 years ago

Steve Moss 2.3k

Does it have to be Perl? This Python code would do the business ;)

# import BioPython SeqIO module
from Bio import SeqIO

(file, id, start, end) = ("secondround_merged_expanded.fasta", "C7136661:0-107", 1, 10) 

record_dict = SeqIO.index(file, "fasta")
print record_dict[id].seq[start:end]

ADD COMMENT • link 11.4 years ago by Steve Moss 2.3k