Troube replacing Sequence Id's in a tree file With Taxonomy Strings From A Corresponding Tab Delimited Taxonomy File
1
0
Entering edit mode
5.1 years ago

I'm relatively new to bioinformatics and programming and need some help running a script that I'm having problems with.

I'm trying to replace sequence IDs in a treefile (newick format) with organism names, the sequence IDs with corresponding organism data are stored in a separate tab delimited file. I found the solution to this problem in a previous post using the following script How To Replace Sequence Id'S In A Text (Tree) File With Taxonomy Strings From A Corresponding Tab Delimited Taxonomy File

use strict;
use warnings;

my $treeFile = pop;
my %taxonomy = map { /(\S+)\s+(.+)/; $1 => $2 } <>;

push @ARGV, $treeFile;

while ( my $line = <> ) {
    $line =~ s/\b$_\b/$taxonomy{$_}/g for keys %taxonomy;
    print $line;
}

The script wont run the whole way through however. After adding in print "StepX\n"; throughout the script, I found it gets stuck at the my %taxonomy = map { /(\S+)\s+(.+)/; $1 => $2 } <>; line.

I've gone back and reformatted the tab file to mirror the format of the tab file in the post, but the script still gets stuck at the same spot. The treefile and taxonomy are in stored in the local file that perl runs from. Is there something simple I have missed to cause the script to fail? Perhaps with defining the directory or filename. The script seems to have worked for others but I'm at a loss as to why it isn't working for me.

Thanks in advance.

For reference my tsv and treefiles look like this.

WP010933552.1   Chlorobaculum;__tepidum
WP011361294.1   Chlorobium;__chlorochromatii
WP006366269.1   Chlorobium;__ferrooxidans
WP012466994.1   Chlorobium;__limicola
WP011745973.1   Chlorobium;__phaeobacteroides
WP012498899.1   Chloroherpeton;__thalassium
WP012509156.1   Pelodictyon;__phaeoclathratiforme
WP012506474.1   Prosthecochloris;__aestuarii
WP014433201.1   Caldilinea;__aerophila
WP013218375.1   Dehalogenimonas;__lykanthroporepellens

and

(WP014737726.1:1.8525851341,((((((((WP027358538.1:0.3690143012,((WP004512544.1:0.1039871466,WP014551809.1:0.0491224567)100:0.3547853057,(WP012469611.1:0.4207143406,(WP012532312.1:0.1128063030,WP015839165.1:0.0280057978)100:0.3010198201)99:0.0913692177)79:0.0631776135)100:1.1860644760,(((((WP005505390.1:0.2412841810,WP027856531.1:0.1524078700)100:0.1071877723,  etc.
data scripts perl • 1.0k views
ADD COMMENT
0
Entering edit mode
5.1 years ago

I found the answer, for any other people new to this. Rather than running perl script.pl, I needed to run perl script.pl taxonomyFile.txt phylogeneticTreeFile.txt >outFile.txt in the command line, which now seems rather obvious as the original post pointed it out.

ADD COMMENT

Login before adding your answer.

Traffic: 2544 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6