Question: Perl loop woes
0
gravatar for jt500
12 months ago by
jt5000
jt5000 wrote:

Hi, first time posting here. I have 126 protein alignments based on ortho groups from OrthoMCL, from which I want to build a multi gene phylogeny. The majority of them are single copy per species, but there are many that have doubles. For concatenation purposes, I need 1 copy per species. I came across this script that enabled me to get rid of duplicates (based on the fasta header, not sequence identity) while appending the additional information (accession numbers) from the removed sequences: (emboldened sections are where you enter your file)

perl -ne 'if (/>(.*?)\s+(.*)/){push(@{$hash{$1}},$2) ;}}{open(I, "<","file.fa");
while(<I>){if(/>(.*?)\s+/){ $t = 0; next if $h{$1}; 
$h{$1} = 1 if $hash{$1}; 
$t = 1; 
chomp; 
print $_ . " @{$hash{$1}}\n"}elsif($t==1){print $_} } close I;
' file.fa

This works perfectly fine when applied to one file, however when I try to apply it to all 126 files with a loop:

for file in ./*.fasta; 
do perl -ne 'if (/>(.*?)\s+(.*)/){push(@{$hash{$1}},$2) ;}}{open(I, "<","$file");
while(<I>){if(/>(.*?)\s+/){ $t = 0; next if $h{$1}; 
$h{$1} = 1 if $hash{$1}; 
$t = 1; 
chomp; 
print $_ . " @{$hash{$1}}\n"}elsif($t==1){print $_} } close I;
' "$file" > single/"$file".1; 
done

It gives the me the expected output files (files with same names with ".1" appended), but the files are empty. I'm new to scripting and know even less about perl, so I'm not particularly sure where it's going wrong. Especially when this loop works on other scripts/commands I've used on large amounts of files. Any help is appreciated. JT

ADD COMMENTlink modified 12 months ago by h.mon26k • written 12 months ago by jt5000
5

Start by writing a perl script. One liners are hard to debug, even harder when you mix languages.

ADD REPLYlink written 12 months ago by Jean-Karim Heriche19k
1

I like Perl, but here I have to agree with Perl detractors: these "one-liners" look like a keyboard puked on the screen.

ADD REPLYlink written 12 months ago by h.mon26k

(For a few dollar signs more)

ADD REPLYlink written 12 months ago by jrj.healey12k

The lyrics are a one-liner perl script.

ADD REPLYlink written 12 months ago by h.mon26k

jt500 : For future reference. Please use the formatting bar (especially the code option) to present your post better. I had done it for you but you seem to have re-edited the post back to where it was before.
code_formatting

Thank you!

ADD REPLYlink modified 12 months ago • written 12 months ago by genomax68k
2
gravatar for h.mon
12 months ago by
h.mon26k
Brazil
h.mon26k wrote:

The One problem with your loop is $file is an empty / undefined variable (there may be others, I didn't try running or reading past this first error). Use something like $file = shift; at the beginning of the one liner - see other solutions at perl line-mode oneliner with ARGV. Beware I think these solutions will fail if your file have spaces in their names.

But it would be better if you write a proper Perl script and use:

use warnings;
use strict;

These would have caught the error.

edit: you may be interested in PhyloTreePruner, it may be a better method of removing duplicated paralogs.

ADD COMMENTlink modified 12 months ago • written 12 months ago by h.mon26k
1
gravatar for cpad0112
12 months ago by
cpad011211k
India
cpad011211k wrote:

Try this and let me know if it works. I removed the output folder name. Please add if necessary:

$ for file in *.fa;  do  export file; perl -ne 'if (/>(.*?)\s+(.*)/){push(@{$hash{$1}},$2) ;}}{open(I, "<", $ENV{"file"}); while(<I>){if(/>(.*?)\s+/){ $t = 0; next if $h{$1};  $h{$1} = 1 if $hash{$1};  $t = 1;  chomp; print $_ . " @{$hash{$1}}\n"}elsif($t==1){print $_} } close I;'  $file > $file.1; done
ADD COMMENTlink modified 12 months ago • written 12 months ago by cpad011211k

This has indeed done the trick, thank you very much! So the change was making "file" the environmental variable?

ADD REPLYlink written 12 months ago by jt5000

Whatever perl was doing inside bash loop, can be done other ways easy., if you could post expected input and output.

ADD REPLYlink written 12 months ago by cpad011211k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1497 users visited in the last hour