Question: Perl, extract specific columns
2
gravatar for elhamidihay
4.1 years ago by
elhamidihay30
elhamidihay30 wrote:

Please help.

I have two files (file1 and file2). I would like to extract the columns from file2 that have their IDs listed in file1. These are big files, with thousands of columns and lines.

file1

Id123B

Id124A

Id125A

file2

Code sex id123B id127 id125A

desired output file:

id123B id125A

Code i have tried:

     #!/usr/bin/perl
     use warnings;
     use strict;
     use feature qw{ say };

    open my $COLUMNS, '<', shift or die $!;
    chomp( my @columns = <$COLUMNS> );

    open my $DATA, '<', shift or die $!;
    my @header = split /\t/, <$DATA>;
    my %column_index;
    @column_index{ @header } = 0 .. $#header;

    @columns = grep exists $column_index{$_}, @columns;

    while (<$DATA>) {
    chomp( my @cells = split /\t/ );
    say join "\t", @cells[ @column_index{ @columns } ];
    }
perl • 3.7k views
ADD COMMENTlink modified 4.1 years ago by novice960 • written 4.1 years ago by elhamidihay30

I think these kind of tasks are easily done with R then to perl, (If it is ok with you, use the following code in R)

ids=read.table("id_file.txt", header=T)
column.file=read.table("columns.txt", header=T)
ids=as.list(ids)
write.table(column.file[unlist(ids)], file="result.txt", sep="\t", quote=FALSE, row.names=FALSE)

Here result.txt file contains the columns whose ids are present in id_file.txt.

ADD REPLYlink written 4.1 years ago by venu6.6k

Perl' PDL and Python's pandas should also be ok with those kind of task.

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by Echo70

Thank you for the suggestion and for your help. It's just that R is much slower when dealing with large datasets.

ADD REPLYlink written 4.1 years ago by elhamidihay30
1
gravatar for novice
4.1 years ago by
novice960
United States
novice960 wrote:

Your code should work fine, but you are not including the headers in the output. Just add say join "\t", @columns; before the while-loop.

BTW, split /\t/ is equivalent to split. :)

ADD COMMENTlink written 4.1 years ago by novice960

@novice, thank you very much, it works :)

ADD REPLYlink written 4.1 years ago by elhamidihay30

Happy to help. You can click the tick to accept it.

ADD REPLYlink written 4.1 years ago by novice960
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1506 users visited in the last hour