Question: Perl Command : Add A Column With Strand Information
0
gravatar for biolab
7.1 years ago by
biolab1.2k
biolab1.2k wrote:

Dear all,

i have a blast output file and need to add a column for strand infromation (+ or -). For example,

gene1 contig2 1 69 100 169
gene2 contig20 3 53 250 200

i need to change it to

gene1 contig2 1 69 100 169 +
gene2 contig20 3 53 250 200 -

note: 100<169 +, 250>200 -

i am new in perl programming. my command is $ cat a.txt | perl -e 'while (<>){chomp; @array = split(//, $_); if ($array[4]< $array[5]){print"@array\t+\n"} else {print"@array\t-\n"} }'

The output is

g e n e -   c o n t i g 2   1   6 9   1 0 0   1 6 9
g e n e -   c o n t i g 2 0   3   5 3   2 5 0   2 0 0

Could anyone help to correct the errors and briefly describe it? Thank you very much!!

perl • 3.2k views
ADD COMMENTlink modified 7.1 years ago by SES8.4k • written 7.1 years ago by biolab1.2k
1

try: replace @array = split(//, $_); with @array = split(/\s/, $_); or simply @array = split;

ADD REPLYlink modified 7.1 years ago • written 7.1 years ago by Michael Dondrup48k

thank you very much for correction.

ADD REPLYlink written 7.1 years ago by biolab1.2k
4
gravatar for SES
7.1 years ago by
SES8.4k
Vancouver, BC
SES8.4k wrote:

Here's a simpler Perl solution (similar to the Awk solution of Frédéric Mahé):

$ echo -e 'gene1 contig2 1 69 100 169\ngene2 contig20 3 53 250 200' \
| perl -ane 'print join "\t", @F, $F[4] > $F[5] ? "-\n" : "+\n"'
gene1    contig2     1    69    100    169    +
gene2    contig20    3    53    250    200    -

You could make it perhaps more readable by adding explicit loops and variables, but for one-liners I think it's best to use the tools you have and save yourself some typing.

EDIT: Perl's command line switches are documented in perlrun (typeperldoc perlrun from the command line).

  • The -e tells Perl to process the command line arguments, which would be any files or STDIN (as is the case above).
  • The -n switch will make Perl loop over the input line by line (the -p does the same, but turns on an implicit print).
  • The -a tells Perl to autosplit the input and put it into an array called "@F" when used with -n or -p. You can change the delimiter with the -F switch.
ADD COMMENTlink modified 7.1 years ago • written 7.1 years ago by SES8.4k

Thank you! The shorter perl command is really cool.

ADD REPLYlink written 7.1 years ago by biolab1.2k

Hi, SES, can i ask you one more question? What's the -ane option stands for? Would you please breifly introduce these functions to me, as I googled perl -ane, but did not find an answer. Thank you very much!

ADD REPLYlink modified 7.1 years ago • written 7.1 years ago by biolab1.2k

I updated my post and to add an explanation of the command.

ADD REPLYlink written 7.1 years ago by SES8.4k

Your explanations are really informative and help me learn perl. Thanks!

ADD REPLYlink written 7.1 years ago by biolab1.2k
3
gravatar for PoGibas
7.1 years ago by
PoGibas4.8k
Vilnius
PoGibas4.8k wrote:

Simple awk solution awk '{if ($5>$6) print $0,"-"; else print $0,"+"}' INPUT

echo -e 'gene1 contig2 1 69 100 169\ngene2 contig20 3 53 250 200' |  awk '{if ($5>$6) print $0,"-"; else print $0,"+"}'
>gene1 contig2 1 69 100 169 +  
gene2 contig20 3 53 250 200 -
ADD COMMENTlink modified 7.1 years ago by Giovanni M Dall'Olio27k • written 7.1 years ago by PoGibas4.8k
3

Hi Pgibas, in Awk if-then-else conditional can be eventually replaced with a ternary operator (shorter but maybe less clear):

echo -e 'gene1 contig2 1 69 100 169\ngene2 contig20 3 53 250 200' | awk '{print $0,($5 > $6) ? "-" : "+"}'

ADD REPLYlink modified 7.1 years ago • written 7.1 years ago by Frédéric Mahé3.1k

Thank you, this is really cool

ADD REPLYlink written 7.1 years ago by PoGibas4.8k

Thank you very much for your solutions!

ADD REPLYlink written 7.1 years ago by biolab1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2227 users visited in the last hour
_