Question: Perl loop woes
0
gravatar for jt500
6 months ago by
jt5000
jt5000 wrote:

Hi, first time posting here. I have 126 protein alignments based on ortho groups from OrthoMCL, from which I want to build a multi gene phylogeny. The majority of them are single copy per species, but there are many that have doubles. For concatenation purposes, I need 1 copy per species. I came across this script that enabled me to get rid of duplicates (based on the fasta header, not sequence identity) while appending the additional information (accession numbers) from the removed sequences: (emboldened sections are where you enter your file)

perl -ne 'if (/>(.*?)\s+(.*)/){push(@{$hash{$1}},$2) ;}}{open(I, "<","file.fa");
while(<I>){if(/>(.*?)\s+/){ $t = 0; next if $h{$1}; 
$h{$1} = 1 if $hash{$1}; 
$t = 1; 
chomp; 
print $_ . " @{$hash{$1}}\n"}elsif($t==1){print $_} } close I;
' file.fa

This works perfectly fine when applied to one file, however when I try to apply it to all 126 files with a loop:

for file in ./*.fasta; 
do perl -ne 'if (/>(.*?)\s+(.*)/){push(@{$hash{$1}},$2) ;}}{open(I, "<","$file");
while(<I>){if(/>(.*?)\s+/){ $t = 0; next if $h{$1}; 
$h{$1} = 1 if $hash{$1}; 
$t = 1; 
chomp; 
print $_ . " @{$hash{$1}}\n"}elsif($t==1){print $_} } close I;
' "$file" > single/"$file".1; 
done

It gives the me the expected output files (files with same names with ".1" appended), but the files are empty. I'm new to scripting and know even less about perl, so I'm not particularly sure where it's going wrong. Especially when this loop works on other scripts/commands I've used on large amounts of files. Any help is appreciated. JT

ADD COMMENTlink modified 6 months ago by h.mon22k • written 6 months ago by jt5000
5

Start by writing a perl script. One liners are hard to debug, even harder when you mix languages.

ADD REPLYlink written 6 months ago by Jean-Karim Heriche17k
1

I like Perl, but here I have to agree with Perl detractors: these "one-liners" look like a keyboard puked on the screen.

ADD REPLYlink written 6 months ago by h.mon22k

(For a few dollar signs more)

ADD REPLYlink written 6 months ago by jrj.healey9.2k

The lyrics are a one-liner perl script.

ADD REPLYlink written 6 months ago by h.mon22k

jt500 : For future reference. Please use the formatting bar (especially the code option) to present your post better. I had done it for you but you seem to have re-edited the post back to where it was before.
code_formatting

Thank you!

ADD REPLYlink modified 6 months ago • written 6 months ago by genomax59k
2
gravatar for h.mon
6 months ago by
h.mon22k
Brazil
h.mon22k wrote:

The One problem with your loop is $file is an empty / undefined variable (there may be others, I didn't try running or reading past this first error). Use something like $file = shift; at the beginning of the one liner - see other solutions at perl line-mode oneliner with ARGV. Beware I think these solutions will fail if your file have spaces in their names.

But it would be better if you write a proper Perl script and use:

use warnings;
use strict;

These would have caught the error.

edit: you may be interested in PhyloTreePruner, it may be a better method of removing duplicated paralogs.

ADD COMMENTlink modified 6 months ago • written 6 months ago by h.mon22k
1
gravatar for cpad0112
6 months ago by
cpad011210k
India
cpad011210k wrote:

Try this and let me know if it works. I removed the output folder name. Please add if necessary:

$ for file in *.fa;  do  export file; perl -ne 'if (/>(.*?)\s+(.*)/){push(@{$hash{$1}},$2) ;}}{open(I, "<", $ENV{"file"}); while(<I>){if(/>(.*?)\s+/){ $t = 0; next if $h{$1};  $h{$1} = 1 if $hash{$1};  $t = 1;  chomp; print $_ . " @{$hash{$1}}\n"}elsif($t==1){print $_} } close I;'  $file > $file.1; done
ADD COMMENTlink modified 6 months ago • written 6 months ago by cpad011210k

This has indeed done the trick, thank you very much! So the change was making "file" the environmental variable?

ADD REPLYlink written 6 months ago by jt5000

Whatever perl was doing inside bash loop, can be done other ways easy., if you could post expected input and output.

ADD REPLYlink written 6 months ago by cpad011210k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1078 users visited in the last hour