Perl, extract specific columns
1
2
Entering edit mode
8.0 years ago
elhamidihay ▴ 30

Please help.

I have two files (file1 and file2). I would like to extract the columns from file2 that have their IDs listed in file1. These are big files, with thousands of columns and lines.

file1

Id123B

Id124A

Id125A

file2

Code sex id123B id127 id125A

desired output file:

id123B id125A

Code i have tried:

     #!/usr/bin/perl
     use warnings;
     use strict;
     use feature qw{ say };

    open my $COLUMNS, '<', shift or die $!;
    chomp( my @columns = <$COLUMNS> );

    open my $DATA, '<', shift or die $!;
    my @header = split /\t/, <$DATA>;
    my %column_index;
    @column_index{ @header } = 0 .. $#header;

    @columns = grep exists $column_index{$_}, @columns;

    while (<$DATA>) {
    chomp( my @cells = split /\t/ );
    say join "\t", @cells[ @column_index{ @columns } ];
    }
perl • 6.5k views
ADD COMMENT
0
Entering edit mode

I think these kind of tasks are easily done with R then to perl, (If it is ok with you, use the following code in R)

ids=read.table("id_file.txt", header=T)
column.file=read.table("columns.txt", header=T)
ids=as.list(ids)
write.table(column.file[unlist(ids)], file="result.txt", sep="\t", quote=FALSE, row.names=FALSE)

Here result.txt file contains the columns whose ids are present in id_file.txt.

ADD REPLY
0
Entering edit mode

Perl' PDL and Python's pandas should also be ok with those kind of task.

ADD REPLY
0
Entering edit mode

Thank you for the suggestion and for your help. It's just that R is much slower when dealing with large datasets.

ADD REPLY
1
Entering edit mode
8.0 years ago
novice ★ 1.1k

Your code should work fine, but you are not including the headers in the output. Just add say join "\t", @columns; before the while-loop.

BTW, split /\t/ is equivalent to split. :)

ADD COMMENT
0
Entering edit mode

@novice, thank you very much, it works :)

ADD REPLY
0
Entering edit mode

Happy to help. You can click the tick to accept it.

ADD REPLY

Login before adding your answer.

Traffic: 3395 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6