Question: How to create a tab delimited file?
1
gravatar for Mimmi Ahlmén
17 months ago by
Mimmi Ahlmén20 wrote:

Hi! I'm doing something wrong here. I have a long text file that look like this:

AB Ana Biba 1029293.34341

And I want to print out the following to a new tab delimited file:

AB         Ana Biba        1029293.34341

Here's my script. Why doesn't it work?

    my $infile = $ARGV[0];

    open (my $infile, "<", "namconvmars.txt")
    or die "Can't read from $infile: $!";

    my (@group1, @group2, @group3);

    while (<$infile>){
        my @cols = split(/\t/);
        push @group1, @cols[0];
        push @group2, @cols[1];
        push @group3, @cols[2];
        print "@group1\t@group2\t@group3";
    }
    close $infile

Thanks in advance!

perl • 623 views
ADD COMMENTlink modified 17 months ago by 5heikki8.9k • written 17 months ago by Mimmi Ahlmén20

how is it related to bioinformatics ?

ADD REPLYlink written 17 months ago by Pierre Lindenbaum129k
2
gravatar for manuel.belmadani
17 months ago by
Canada
manuel.belmadani1.2k wrote:

You want to split your input on space, not tab.

e.g.

       # my @cols = split(/\t/); # Change this
       my @cols = split(' ');  # To this.
ADD COMMENTlink written 17 months ago by manuel.belmadani1.2k
1
gravatar for Pierre Lindenbaum
17 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum129k wrote:

BTW, you want:

tr " " "\t" < namconvmars.txt
ADD COMMENTlink written 17 months ago by Pierre Lindenbaum129k

I haven't seen this command before. Awesome!

ADD REPLYlink written 17 months ago by Robert Sicko610
1
gravatar for Bill Pearson
17 months ago by
Bill Pearson860
Bill Pearson860 wrote:

You do not need three "@group"s -- you either need three scalars ($field0, $field1, $field2) or one @group, which you could print with join("\t",@group);

A simpler solution is to:

while (my $line = <$input>) { 
  chomp($line)
  print join("\t",split(/\s+/,$line),"\n"
}

or

$line =~ s/\s+/\t/;
print $line
ADD COMMENTlink modified 17 months ago by genomax87k • written 17 months ago by Bill Pearson860
1
gravatar for JC
17 months ago by
JC11k
Mexico
JC11k wrote:

There are some Perl-ings you need to understand first:

my $infile = $ARGV[0];

This line reads the first command line argument after your script name and pass to the variable $infile

open (my $infile, "<", "namconvmars.txt")
or die "Can't read from $infile: $!";

You are declaring again $infile (that is what my does), also you are reusing the variable to be a file pointer. So, you don't need the first line my $infile = $ARGV[0] because you never used it.

my (@group1, @group2, @group3);

while (<$infile>){
    my @cols = split(/\t/);
    push @group1, @cols[0];
    push @group2, @cols[1];
    push @group3, @cols[2];
    print "@group1\t@group2\t@group3";
}
close $infile

On this part I think you want to collect the values, but if your intention is to simply convert each line, you don't need the arrays, just read, modify and print each line. The complex part I see, when you split the line using spaces, the second element is splitted too ("Ana Biba" -> ["Ana", "Biba"], to avoid this you will need to reconstruct that element. Something like:

#!/usr/bin/perl
use strict;
use warnings;
my $file = "namconvmars.txt";
open (my $infile, "<", $file)
or die "Can't read from $file";

while (<$infile>){
    my @cols = split(/\s+/, $_);  # break line using spaces
    my $first = shift(@cols);  # grab first element
    my $last  = pop(@cols); # grab last element
    my $mid   = join " ", @cols; # reconstruct middle element
    print join "\t", $first, $mid, $last;
}
close $infile
ADD COMMENTlink modified 17 months ago • written 17 months ago by JC11k

Thank you so much!

Actually, my file has several elements that looks the same:

XX Xxxxx_Xxxx YyyYy
XY Xyxyx_Xyxyx YxYx

So I need to go to then next row after each row. How do I do this?

ADD REPLYlink modified 17 months ago • written 17 months ago by Mimmi Ahlmén20
1

It's complaining about or die "Can't read from $infile: $!";. Which makes sense, if open (my $infile, "<", "namconvmars.txt") fails for some reason, then $infile wont be set, so you can't use it in your error message (which would print the content of the file anyways, probably not what you wanted.) You weren't seeing this error originally because you were declaring $infile before the open statement, so you were making sure it was declared even if open fails.

You probably want to do something like:

my $filename = "namconvmars.txt"; # Or set it via $ARGV
open (my $infile, "<", $filename) or die "Can't read from '$filename' !";

So if for some reason $filename is not readable, you'll see: Can't read from 'namconvmars.txt' ! at tabs.pl line 6.

ADD REPLYlink modified 17 months ago • written 17 months ago by manuel.belmadani1.2k

true, I modify the code to read the file name from another var

ADD REPLYlink written 17 months ago by JC11k

The while (<$infile>) {} loop reads the file line per line

ADD REPLYlink written 17 months ago by JC11k
1
gravatar for 5heikki
17 months ago by
5heikki8.9k
Finland
5heikki8.9k wrote:
awk 'BEGIN{FS=" ";OFS="\t"}{print $1,$2" "$3,$4}' in > out

edit. More general solution where the first and last space are replaced with tabs

awk 'BEGIN{FS=" "}{L=$NF; NF--; sub(" ","\t",$0); print $0"\t"L}' in > out
ADD COMMENTlink modified 17 months ago • written 17 months ago by 5heikki8.9k
1
gravatar for cpad0112
17 months ago by
cpad011213k
India
cpad011213k wrote:

with sed:

$ sed 's/\s\+/\t/g' test.txt          
AB  Ana Biba    1029293.34341
ADD COMMENTlink written 17 months ago by cpad011213k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1606 users visited in the last hour