Help Me Finish This Perl Code To Extract A Column In A Table
3
2
Entering edit mode
9.6 years ago
shane.neeley ▴ 50

Hi, I have a question similar to this one:

http://www.biostars.org/post/show/50142/any-modules-available-to-parse-this-file/#50156

I adapted my code from JCs answer in that post. Thanks JC.

Here is an example of the data file data I am opening and trying to read the columns of. The values are delimited by 4 spaces.

A bunch of junk up here. Paragraph before getting to table.

NO.  RES   DSC_SEC PROB_H    PROB_E    PROB_C
1     k      C     0.047     0.240     0.713
2     l      C     0.067     0.365     0.568
3     n      C     0.067     0.365     0.568
4     f      E     0.045     0.613     0.342
...


Here is the code I have tried, which doesn't print anything. I want to be able to gather the data from PROB_H, PROB_E, PROB_C and have them in separate lists so that I can do stuff like take the averages of them.

use strict;
use warnings;

open(FILE, "file_data.txt") or die "Cannot open file: $!"; my @data = <FILE>; while (<FILE>) { next if m/^No./; chomp; my ($NO, $RES,$DSC_SEC, $PROB_H,$PROB_E, $PROB_C) = split(/\s+/, @data); print "$PROB_H";
}

close(FILE);

perl data extraction • 8.3k views
0
Entering edit mode

Why would I be downvoted?

0
Entering edit mode

Some people are harsh :) Someone probably thought this was a rather basic Perl programming question, as opposed to a bioinformatics research question.

0
Entering edit mode

Two obvious errors straight off: (1) you have not escaped the period in your regular expression (so it will match "all characters"); (2) your data contains lines starting with NO (all upper-case) but your regular expression is looking for lines starting with No (lower-case "o").

0
Entering edit mode

Basically, you want to implement the 'cut' unix command in Perl? Specifically, something like tail -n +2 | cut -c18-26?

6
Entering edit mode
9.6 years ago

It would probably be better to ask this question at stackoverflow.

Without a file it is kinda difficult to debug, but this may do the trick. You could also just use grep | awk ....

#/usr/bin/perl
use strict;
use warnings;

open(my $FH, <, "file_data.txt") or die "Cannot open file:$!";
LINE: while (my $line = <$FH>) {
chomp $line; next LINE unless$line =~ /^[0-9]/;
my ($NO,$RES, $DSC_SEC,$PROB_H, $PROB_E,$PROB_C) = split /\s+/,  $line; print "$PROB_H\n";
}
close($FH);  ADD COMMENT 0 Entering edit mode Let me know if it works. ADD REPLY 0 Entering edit mode That gives me my column, thanks. I'm new to perl, what is the function of LINE: and$_ in this?

2
Entering edit mode

You can name your loops in perl. It can be useful to keep track of things. $_ is a special variable. In your script above it contained the line. ADD REPLY 0 Entering edit mode Also, this unless it matches [0-9]? ADD REPLY 0 Entering edit mode ^ start with, a numeric value [0-9]. ADD REPLY 0 Entering edit mode It also ends up printing one of the words up in the paragraph. Can we make it start on the header of the table, and grab the numbers below the header, like I tried before? ADD REPLY 0 Entering edit mode And if I have a lot of columns? ADD REPLY 1 Entering edit mode 9.6 years ago Irsan ★ 7.5k Or as suggested keep it simple [your prompt]$ grep '^[0-9]' yourfile.txt | awk '{print $4}' to print out the fourth column of lines in yourfile.txt that start with a number ADD COMMENT 0 Entering edit mode 9.6 years ago Eric ▴ 40 There are a couple of problems with your script. It isn't necessary to read the file into an array as you are going to iterate over the file line by line. Also your test for throwing out non matching lines is only going to match a line starting with "No." and not the other non-matching lines in the file.  use strict; use warnings; open(FILE, "file_data.txt") or die "Cannot open file:$!";

while (my $line = <FILE>) { #unless the line begins with a number followed by #one or more whitespace characters, skip it. unless ($line =~ m/^\d+\s+/) {next;}
chomp $line; my ($NO, $RES,$DSC_SEC, $PROB_H,$PROB_E, $PROB_C) = split(/\s+/,$line);
print "$PROB_H\n"; } close(FILE);  ADD COMMENT 0 Entering edit mode There is a line in the paragraph that begins with a number. Can I exclude it for containing certain words. ADD REPLY 0 Entering edit mode Such as make that unless clause: unless ($line =~ m/^\d+\s+|residues/) {next;}
because that line that starts with a number has the word residues in it. This does not work for some reason.