Question: How to import fasta file by using perl DBI.?
1
gravatar for Krithi.S
4.7 years ago by
Krithi.S10
India
Krithi.S10 wrote:

use DBI;

use strict;

use warning;

my $textfile = "fasta.txt";
open (TEXT,"$textfile") or die "Could not open:$!";

my $dbh = DBI->connect("dbi:mysql:database=file",user,password,{RaiseError=>1}) or die "Could not connect $DBI:errstr";
   
while(<TEXT>){
    $first = <TEXT>;
    @second = <TEXT>;
    $dbh->do("INSERT INTO open values (?,?)", undef, $first,@second);
    }

$dbh->disconnect;
close TEXT;

 

 

This is My Perl code. It is not insert data in database. I created database and table. database name is file and table name is open, the table contain two column one is header and another one is sequence.

My expectation. i will create database have two column, one column contain the first line of fasta file and second column contain their sequence.

Header sequence
>sp|P00547|KHSE_ECOLI Homoserine kinase OS=Escherichia coli (strain K12) GN=thrB PE=1 SV=2

MVKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETFSLNNLGRFADKLPSEP
RENIVYQCWERFCQELGKQIPVAMTLEKNMPIGSGLGSSACSVVAALMAMNEHCGKPLND

 

Please help me..

dbi database perl • 2.0k views
ADD COMMENTlink modified 4.7 years ago by Jean-Karim Heriche21k • written 4.7 years ago by Krithi.S10
2
gravatar for Jean-Karim Heriche
4.7 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche21k wrote:

First your code should not run: you 'use strict' and declare variables without 'my'. Second, you're not using placeholders correctly. Each ? has to map to a scalar so (?,?) requires two scalars and it looks like you have a scalar ($first) and a list (@second). However, using $dbh->do() in this way is inefficient because you create and destroy a statement handle at each iteration. A more efficient way to use placeholders is with prepare:

my $query = qq(INSERT INTO open values (?,?));
my $sth = $dbh->prepare($query);
while(<TEXT>){
    my $first = <TEXT>;
    my $second = <TEXT>; # get the next line of text only
    $sth->execute($first,$second);
}

Finally, you also need to parse your fasta file properly.

ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by Jean-Karim Heriche21k

@Jean-Karim Heriche, thanking you for your nice answer. But i am not able to getting Header in my output in one column. ? only my output is have a sequence, but its to split into three rows. I want to expect in that header column only the first line and that sequence column contain all sequence in fasta file. Please help me.

 

ADD REPLYlink written 4.7 years ago by Krithi.S10

This is what I meant by parsing your fasta file correctly. I suggest you use BioPerl's Bio::SeqIO for this. However, if you need an alternative, here is a function that does the job:

=head2 get_fasta_record

 Arg: opened filehandle
 Description: Reads next record from a fasta file
 Returntype: arrayref to (header, sequence)

=cut

sub get_fasta_record {
  my $fh = shift;
  local $/ = ">";
  my ($header,$sequence);
  while(my $line =<$fh>) {
    chomp($line);
    $line=~s/\R/\n/g; # replace any line break characters with \n
    next if ($line=~/^\s*$/);
    my $newline_pos = index($line,"\n");
    $header = substr($line,0,$newline_pos);
    $sequence = substr($line,$newline_pos+1);
    $sequence =~s/\n//g;
    $sequence =~s/\s+//g;
    last;
  }
  if (defined($header)) {
    return [$header,$sequence];
  }
  else { return undef; }
}

Use like this:

open my $fh,"<",$ARGV[0] or die "\nERROR: Can't read file $ARGV[0]: $!";
while (my ($header,$sequence) = @{$seq_reader->get_fasta_record($fh)}) {
  # Do something here
}
close $fh;
ADD REPLYlink written 4.7 years ago by Jean-Karim Heriche21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1247 users visited in the last hour