Read a single fasta sequence into a scalar variable in Perl
3
0
Entering edit mode
9.2 years ago
trying170 ▴ 20

Hi,

I'm trying to learn Perl and have been given the task of taking a fasta file containing a single DNA sequence and finding all of its EcoRI sites. I've been instructed to start by reading the fasta file using a while loop, and put the entire sequence into a scalar variable $sequence. Now, I don't want to ask how to do this entire task, because I am trying to do as much myself as possible, but I haven't the foggiest idea how to do this first part. I have a fasta file and I need to get it into my scalar $sequence. How would I do that? What would such a while loop look like? Reading in files is the part I have the least grasp on right now. Could someone explain and/or provide an example of what a while loop that reads a fasta file into a scalar would look like? I feel like I can solve the rest of this task on my own if I could just get past this very first step. Thanks!

perl sequence fasta • 2.7k views
ADD COMMENT
2
Entering edit mode

Thank you, guys. You've all been very helpful with your tips and examples. I'm going to sit down later and try to put this all together. I think I have a much better idea of what I need to do now. Hopefully I can do it!

ADD REPLY
0
Entering edit mode

That's the spirit!

ADD REPLY
1
Entering edit mode
9.2 years ago
Ram 43k

I'm gonna start by appreciating you for the drive to do it yourself - that is the best approach to learning a new language.

Look up these concepts:

  • Using the <SOME_NAME_HERE> syntax to read file (this will help you with the loop as well, assuming you know why a while loop is needed in the first place)
  • String operators and functions to check if a string starts with and/or contains a specific character/regular expression
  • String manipulation, such as concatenation, trimming, chomping (critical if you wish to avoid crazy bugs)

HTH

ADD COMMENT
1
Entering edit mode
9.2 years ago
ballisticws ▴ 20

This is how I would do it. I would also like to suggest you to check out perlmonks.org. I found them unbelievably helpful with learning Perl. Good luck!

# Fasta file to read
my $file = "file.fasta";

# Sequence var to hold the fasta data
my $sequence = "";

# Open file for reading
open my $FASTAFILE, $file or die "Could not open $file: $!";

# Read file line-by-line
while(my $line = <$FASTAFILE>)  {
    # Concatenate everything from the file into a single var
    $sequence .= $line;
}

# Close opened files
close $FASTAFILE;
ADD COMMENT
1
Entering edit mode

I know you are answering the question exactly but this is not the best approach for a couple of reasons. First, this is not actually parsing the FASTA record. Second, there is a more idiomatic way to read a file into a scalar with:

my $var = do { local $/; <$fh> };

where $var holds your file however, I would not do this with sequence data. It's best to set the record separator and read line by line and not store the data (unless necessary), but that was not part of the question.

ADD REPLY
1
Entering edit mode

Of course not, but he asked to just put everything into a single var using a while loop, and he will handle it from there. That was the question, exactly.

ADD REPLY
0
Entering edit mode

The question was how to "put the entire sequence into a scalar variable $sequence" not how to put the entire file into a scalar. To do that, you need to parse the file like Alex Reynolds did. Otherwise, you would have to parse the scalar before you search for cut sites, and that would involve a second pass over the entire record.

ADD REPLY
0
Entering edit mode

There is a follow up of that quote you posted there.

Now, I don't want to ask how to do this entire task, because I am trying to do as much myself as possible, but I haven't the foggiest idea how to do this first part. I have a fasta file and I need to get it into my scalar $sequence. How would I do that? What would such a while loop look like? Reading in files is the part I have the least grasp on right now. Could someone explain and/or provide an example of what a while loop that reads a fasta file into a scalar would look like?

Based on what he wrote, I figured that he only wanted the while loop and then he will expand from that on his own. That's what I'm reading from the quote there.

ADD REPLY
0
Entering edit mode

I agree with you, as I said in my first comment, reading the whole file was mentioned. I think we are discussing what is being asked versus what is wanted and that is sometimes hard to figure out. What I am referring to is how you would want to solve the larger question, which involves finding restriction sites. That is all I was trying to help with. Hopefully, the discussion helps OP get a little closer! Thanks.

ADD REPLY
1
Entering edit mode
9.2 years ago
#!/usr/bin/env perl

#
# fasta_parser.pl
#
# Usage: $ ./fasta_parser.pl some_data.fa
#

use strict;
use warnings;

my $filename = $ARGV[0];
open my $file_handle, "<", $filename or die "could not open $filename\n";
while (<$file_handle>) {
    chomp;
    if ($_ =~ /^>/) {
        print "this line is a header: $_\n";
    }
    else {
        print "this line contains sequence data: $_\n";
    }
}
close $file_handle;

Since this parses an input file one line at a time, you might look at the if-else block to think about how you might store a scalar for your sequence, given that a FASTA record's sequence could be on one line, or the whole sequence might span multiple lines.

ADD COMMENT

Login before adding your answer.

Traffic: 1837 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6