Question: Read a single fasta sequence into a scalar variable in Perl
0
gravatar for trying170
5.0 years ago by
trying17020
United States
trying17020 wrote:

Hi,

I'm trying to learn Perl and have been given the task of taking a fasta file containing a single DNA sequence and finding all of its EcoRI sites. I've been instructed to start by reading the fasta file using a while loop, and put the entire sequence into a scalar variable $sequence. Now, I don't want to ask how to do this entire task, because I am trying to do as much myself as possible, but I haven't the foggiest idea how to do this first part. I have a fasta file and I need to get it into my scalar $sequence. How would I do that? What would such a while loop look like? Reading in files is the part I have the least grasp on right now. Could someone explain and/or provide an example of what a while loop that reads a fasta file into a scalar would look like? I feel like I can solve the rest of this task on my own if I could just get past this very first step. Thanks!

sequence fasta perl • 1.7k views
ADD COMMENTlink modified 3.2 years ago by Biostar ♦♦ 20 • written 5.0 years ago by trying17020
2

Thank you, guys. You've all been very helpful with your tips and examples. I'm going to sit down later and try to put this all together. I think I have a much better idea of what I need to do now. Hopefully I can do it!

ADD REPLYlink written 5.0 years ago by trying17020

That's the spirit!

ADD REPLYlink written 5.0 years ago by RamRS25k
1
gravatar for RamRS
5.0 years ago by
RamRS25k
Houston, TX
RamRS25k wrote:

I'm gonna start by appreciating you for the drive to do it yourself - that is the best approach to learning a new language.

Look up these concepts:

  • Using the <SOME_NAME_HERE> syntax to read file (this will help you with the loop as well, assuming you know why a while loop is needed in the first place)
  • String operators and functions to check if a string starts with and/or contains a specific character/regular expression
  • String manipulation, such as concatenation, trimming, chomping (critical if you wish to avoid crazy bugs)

HTH

ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by RamRS25k
1
gravatar for ballisticws
5.0 years ago by
ballisticws20
Germany
ballisticws20 wrote:

This is how I would do it. I would also like to suggest you to check out perlmonks.org. I found them unbelievably helpful with learning Perl. Good luck!

# Fasta file to read
my $file = "file.fasta";

# Sequence var to hold the fasta data
my $sequence = "";

# Open file for reading
open my $FASTAFILE, $file or die "Could not open $file: $!";

# Read file line-by-line
while(my $line = <$FASTAFILE>)  {
    # Concatenate everything from the file into a single var
    $sequence .= $line;
}

# Close opened files
close $FASTAFILE;
ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by ballisticws20
1

I know you are answering the question exactly but this is not the best approach for a couple of reasons. First, this is not actually parsing the FASTA record. Second, there is a more idiomatic way to read a file into a scalar with:

my $var = do { local $/; <$fh> };

where $var holds your file however, I would not do this with sequence data. It's best to set the record separator and read line by line and not store the data (unless necessary), but that was not part of the question.

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by SES8.3k
1

Of course not, but he asked to just put everything into a single var using a while loop, and he will handle it from there. That was the question, exactly.

ADD REPLYlink written 5.0 years ago by ballisticws20

The question was how to "put the entire sequence into a scalar variable $sequence" not how to put the entire file into a scalar. To do that, you need to parse the file like Alex Reynolds did. Otherwise, you would have to parse the scalar before you search for cut sites, and that would involve a second pass over the entire record.

ADD REPLYlink written 5.0 years ago by SES8.3k

There is a follow up of that quote you posted there.

"Now, I don't want to ask how to do this entire task, because I am trying to do as much myself as possible, but I haven't the foggiest idea how to do this first part. I have a fasta file and I need to get it into my scalar $sequence. How would I do that? What would such a while loop look like? Reading in files is the part I have the least grasp on right now. Could someone explain and/or provide an example of what a while loop that reads a fasta file into a scalar would look like?"

Based on what he wrote, I figured that he only wanted the while loop and then he will expand from that on his own. That's what I'm reading from the quote there.

ADD REPLYlink written 5.0 years ago by ballisticws20

I agree with you, as I said in my first comment, reading the whole file was mentioned. I think we are discussing what is being asked versus what is wanted and that is sometimes hard to figure out. What I am referring to is how you would want to solve the larger question, which involves finding restriction sites. That is all I was trying to help with. Hopefully, the discussion helps OP get a little closer! Thanks.

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by SES8.3k
1
gravatar for Alex Reynolds
5.0 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:
#!/usr/bin/env perl

#
# fasta_parser.pl
#
# Usage: $ ./fasta_parser.pl some_data.fa
#

use strict;
use warnings;

my $filename = $ARGV[0];
open my $file_handle, "<", $filename or die "could not open $filename\n";
while (<$file_handle>) {
    chomp;
    if ($_ =~ /^>/) {
        print "this line is a header: $_\n";
    }
    else {
        print "this line contains sequence data: $_\n";
    }
}
close $file_handle;

Since this parses an input file one line at a time, you might look at the if-else block to think about how you might store a scalar for your sequence, given that a FASTA record's sequence could be on one line, or the whole sequence might span multiple lines.

ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by Alex Reynolds29k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 715 users visited in the last hour