Hello Internet...I need help...
I am trying to write a perl script to take a BLASTp .txt output file containing the top 100 hits for each protein sequence in a genome and process some simple information from the file.
Currently the program is adding one to a value in each row of the first column for each time it finds one of the desired hits and one to the second column for each non-hit. It is not recognizing each protein correctly and it is giving me a sum total for the whole file. What I essential want to know for each protein is how many hits have been recorded, how many of those hits are to a particular phylum (Elusimicrobia), and how many are not to that phylum and naturally the gene name in the first column but all consolidated into one row.
PLEASE HELP!!! THANK YOU!!
#!/usr/bin/perl
use warnings;
die "usage: <count_elusi_hits>\n" unless @ARGV == 2;
$inputfile = $ARGV[0]; # blast.out
$outputfile = $ARGV[1]; # hit counts
open(IN, $inputfile) or die " Something went wrong! Check your 1st input file";
open(OUT, ">$outputfile");
%hit_hash = ();
$elusi_hits = 0;
$non_elusi_hits = 0;
print OUT "Elusimicrobia Hit Non-Elusi Hits\n";
while ($line= <IN>){
chomp $line;
@array = split (/\t/, $line); #split at tab, make array
$hit_hash{$array[0]}= $array[1];
#print "$_\n" for keys %hit_hash;
#print "$hit_hash{$array[0]}\n";
#}
for $keys (keys %hit_hash){
if ($hit_hash{$array[0]} =~ /Elusi*|Endomicrobium*|Termite group 1*|TG1*/){
$elusi_hits += +1;
}
else{
$non_elusi_hits += +1;
}
print OUT "$elusi_hits $non_elusi_hits\n";
}
}
close IN;
close OUT;
exit;
Input file looks something like this with ~1000 blasted proteins and roughly 77,000 hit proteins:
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Candidatus Endomicrobium trichonymphae]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) enolase [Candidatus Endomicrobium trichonymphae]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Endomicrobium proavitum]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Elusimicrobia bacterium RIFOXYA12_FULL_49_49]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Elusimicrobia bacterium RIFOXYA2_FULL_40_6]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Elusimicrobia bacterium RIFOXYA2_FULL_50_26]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Elusimicrobia bacterium RIFOXYA2_FULL_39_19]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Elusimicrobia bacterium CG1_02_37_114]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Elusimicrobia bacterium RIFOXYB2_FULL_48_7]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Elusimicrobia bacterium HGW-Elusimicrobia-1]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Elusimicrobia bacterium RIFCSPLOWO2_01_FULL_59_12]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Desulfuromonas sp.]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Elusimicrobia bacterium GWA2_38_7]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Desulfuromonas sp.]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Elusimicrobia bacterium RIFOXYD2_FULL_34_15]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Geoalkalibacter ferrihydriticus]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Pelobacter carbinolicus]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Geopsychrobacter electrodiphilus]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Desulfuromonas sp.]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Desulfobacteraceae bacterium 4572_35.1]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Pelobacter acetylenicus]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Elusimicrobia bacterium CG06_land_8_20_14_3_00_38_11]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Brevibacillus laterosporus]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Desulfuromonadales bacterium C00003068]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Brevibacillus laterosporus]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Malonomonas rubra]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Brevibacillus laterosporus]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Elusimicrobia bacterium RIFOXYC2_FULL_34_12]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Idiomarina salinarum]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Pelobacter seleniigenes]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Desulfuromonas sp.]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Desulfuromonas sp.]
Endomicrobium_sp_ds12|_fig|6666666.252603.peg.2_|_scaffold1.1_1490_186_|_Enolase_(EC_4.2.1.11) phosphopyruvate hydratase [Succinivibrio dextrinosolvens]