Perl loop woes
2
0
Entering edit mode
5.8 years ago
jt500 • 0

Hi, first time posting here. I have 126 protein alignments based on ortho groups from OrthoMCL, from which I want to build a multi gene phylogeny. The majority of them are single copy per species, but there are many that have doubles. For concatenation purposes, I need 1 copy per species. I came across this script that enabled me to get rid of duplicates (based on the fasta header, not sequence identity) while appending the additional information (accession numbers) from the removed sequences: (emboldened sections are where you enter your file)

perl -ne 'if (/>(.*?)\s+(.*)/){push(@{$hash{$1}},$2) ;}}{open(I, "<","file.fa");
while(<I>){if(/>(.*?)\s+/){ $t = 0; next if $h{$1}; 
$h{$1} = 1 if $hash{$1}; 
$t = 1; 
chomp; 
print $_ . " @{$hash{$1}}\n"}elsif($t==1){print $_} } close I;
' file.fa

This works perfectly fine when applied to one file, however when I try to apply it to all 126 files with a loop:

for file in ./*.fasta; 
do perl -ne 'if (/>(.*?)\s+(.*)/){push(@{$hash{$1}},$2) ;}}{open(I, "<","$file");
while(<I>){if(/>(.*?)\s+/){ $t = 0; next if $h{$1}; 
$h{$1} = 1 if $hash{$1}; 
$t = 1; 
chomp; 
print $_ . " @{$hash{$1}}\n"}elsif($t==1){print $_} } close I;
' "$file" > single/"$file".1; 
done

It gives the me the expected output files (files with same names with ".1" appended), but the files are empty. I'm new to scripting and know even less about perl, so I'm not particularly sure where it's going wrong. Especially when this loop works on other scripts/commands I've used on large amounts of files. Any help is appreciated. JT

script perl for loop ortho groups phylogentics • 3.1k views
ADD COMMENT
5
Entering edit mode

Start by writing a perl script. One liners are hard to debug, even harder when you mix languages.

ADD REPLY
1
Entering edit mode

I like Perl, but here I have to agree with Perl detractors: these "one-liners" look like a keyboard puked on the screen.

ADD REPLY
0
Entering edit mode

(For a few dollar signs more)

ADD REPLY
0
Entering edit mode

The lyrics are a one-liner perl script.

ADD REPLY
0
Entering edit mode

jt500 : For future reference. Please use the formatting bar (especially the code option) to present your post better. I had done it for you but you seem to have re-edited the post back to where it was before.
code_formatting

Thank you!

ADD REPLY
2
Entering edit mode
5.8 years ago
h.mon 35k

The One problem with your loop is $file is an empty / undefined variable (there may be others, I didn't try running or reading past this first error). Use something like $file = shift; at the beginning of the one liner - see other solutions at perl line-mode oneliner with ARGV. Beware I think these solutions will fail if your file have spaces in their names.

But it would be better if you write a proper Perl script and use:

use warnings;
use strict;

These would have caught the error.

edit: you may be interested in PhyloTreePruner, it may be a better method of removing duplicated paralogs.

ADD COMMENT
1
Entering edit mode
5.8 years ago

Try this and let me know if it works. I removed the output folder name. Please add if necessary:

$ for file in *.fa;  do  export file; perl -ne 'if (/>(.*?)\s+(.*)/){push(@{$hash{$1}},$2) ;}}{open(I, "<", $ENV{"file"}); while(<I>){if(/>(.*?)\s+/){ $t = 0; next if $h{$1};  $h{$1} = 1 if $hash{$1};  $t = 1;  chomp; print $_ . " @{$hash{$1}}\n"}elsif($t==1){print $_} } close I;'  $file > $file.1; done
ADD COMMENT
0
Entering edit mode

This has indeed done the trick, thank you very much! So the change was making "file" the environmental variable?

ADD REPLY
0
Entering edit mode

Whatever perl was doing inside bash loop, can be done other ways easy., if you could post expected input and output.

ADD REPLY

Login before adding your answer.

Traffic: 2362 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6