I'm approaching HHsuite for the first time and I'm trying to replicate with my local installation the results I get when I run the web version of HHblits and HHpred. I suspect that the web version is doing something "hidden" while preparing the input files. At the moment I'm running HHsuite 3.3.0, I installed blast 2.2.26, psipred 4.02 and the latest versions of pdb70 and uniref. My workflow is:
1) Start from a fasta sequence (file: isoform1.txt)
2) Run hhblits:
hhblits -cpu 4 -i isoform1.txt -d Uniref/UniRef30_2020_03 -o out_hhblits.hhr -oa3m out_hhblits.a3m -e 1e-3 -n 3 -p 20 -Z 250 -z 1 -b 1 -B 250
There are already some differences between the results that I get from the webserver and locally, which makes me think I'm missing something probably in the generation of the input file for hhblits
3) run addss.pl
addss.pl out_hhblits.a3m
4) run hhmake
hhmake -i out_hhblits.a3m
5) run hhsearch
hhsearch -cpu 4 -i out_hhblits.a3m -d ../../pdb70/pdb70 -o hras_hhpred.hhr -oa3m hras_hhpred.a3m -p 20 -Z 250 -loc -z 1 -b 1 -B 250 -ssm 2 -sc 1 -seq 1 -dbstrlen 10000 -norealign -maxres 32000 -contxt /usr/local/src/hh-suite-master/data/context_data.crf
local outputs:
blits:
Query sp|P01112|RASH_HUMAN GTPase HRas OS=Homo sapiens OX=9606 GN=HRAS PE=1 SV=1 Match_columns 189 No_of_seqs 1380 out of 7256 Neff 11.8887 Searched_HMMs 28721 Date Wed Sep 30 10:04:06 2020 Command hhblits -cpu 4 -i isoform1.txt -d ../../UniRef30_2020_03_hhsuite/UniRef30_2020_03 -o out_hhblits.hhr -oa3m out_hhblits.a3m -e 1e-3 -n 3 -p 20 -Z 250 -z 1 -b 1 -B 250
No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM 1 UniRef100_A0A3M0JH82 Uncharact 100.0 2.6E-49 5.9E-55 252.6 0.0 189 1-189 97-285 (285) 2 UniRef100_A0A061I953 GTPase HR 100.0 6E-49 1.4E-54 250.6 0.0 188 1-188 1-188 (289) 3 UniRef100_UPI0012625CA5 GTPase 100.0 7E-47 1.6E-52 237.7 0.0 189 1-189 131-319 (319) 4 UniRef100_A0A023FX11 Uncharact 100.0 1.3E-46 2.9E-52 238.2 0.0 188 1-189 65-252 (272) 5 UniRef100_A0A022QBX4 Uncharact 100.0 2.7E-45 5.8E-51 238.0 0.0 162 2-164 27-190 (218) 6 UniRef100_A0A015IFZ5 Rsr1p n=3 100.0 3E-44 6.5E-50 235.7 0.0 166 3-168 21-188 (258)
search:
Query sp|P01112|RASH_HUMAN GTPase HRas OS=Homo sapiens OX=9606 GN=HRAS PE=1 SV=1 Match_columns 189 No_of_seqs 672 out of 48006 Neff 14.1061 Searched_HMMs 83244 Date Thu Oct 1 09:58:32 2020 Command hhsearch -cpu 4 -i hras_hhblits.a3m -d ../../pdb70/pdb70 -o hras_hhpred.hhr -oa3m hras_hhpred.a3m -p 20 -Z 250 -loc -z 1 -b 1 -B 250 -ssm 2 -sc 1 -seq 1 -dbstrlen 10000 -norealign -maxres 32000 -contxt /usr/local/src/hh-suite-master/data/context_data.crf
No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM 1 5XCO_A B-cell lymphoma 6 prote 99.9 2E-22 2.4E-27 121.3 22.7 166 1-166 3-168 (171) 2 5UQW_A GTPase KRas (E.C.3.6.5. 99.9 2.9E-22 3.4E-27 122.8 23.1 166
1-166 21-186 (189) 3 2CE2_X GTPASE HRAS; SIGNALING 99.9 2.8E-22 3.3E-27 120.0 22.1 164 1-164 1-164 (166) 4 6BOF_A GTPase KRas; HYDROLASE, 99.9 4.5E-22 5.4E-27 119.4 22.7 168 2-169
1-168 (168) 5 5E95_A Mb(NS1), GTPase HRas; H 99.9 4.3E-22 5.2E-27 119.2 22.5 165 1-165 3-167 (168) 6 4KLZ_A GTP-binding protein Rit 99.9 1E-21 1.2E-26 118.5 23.3 170 1-170 3-173 (173)
Web outputs:
blits:
Query hras_hhblits Match_columns 189 No_of_seqs 1 out of 1 Neff 1 Searched_HMMs 20000 Date Tue Sep 29 14:33:42 2020 Command hhblits -cpu 8 -i ../results/hras_hhblits.in.a3m -d /cluster/toolkit/production/databases/hhblits/UniRef30 -o /ebio/toolkit_rye/user/toolkit/production/jobs/hras_hhblits/results/hras_hhblits.hhr -oa3m /ebio/toolkit_rye/user/toolkit/production/jobs/hras_hhblits/results/hras_hhblits.a3m -e 1e-3 -n 1 -p 20 -Z 250 -z 1 -b 1 -B 250
No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM 1 UniRef100_A0A061I953 GTPase HR 100.0 7E-130 2E-135 837.4 0.0 188 1-188 1-188 (289) 2 UniRef100_A0A3M0JH82 Uncharact 100.0 1E-126 4E-132 816.7 0.0 189 1-189 97-285 (285) 3 UniRef100_UPI0012625CA5 GTPase 100.0 1E-119 3E-125 777.3 0.0 189 1-189 131-319 (319) 4 UniRef100_A0A023FX11 Uncharact 100.0 4E-118 1E-123 760.8 0.0 188 1-189 65-252 (272) 5 UniRef100_A0A3L7HWJ0 H-RAS (Fr 100.0 9E-116 2E-121 757.3 0.0 184 1-184 155-338 (338) 6 UniRef100_UPI000C29D954 GTPase 100.0 2E-114 4E-120 758.7 0.0 187 1-187 1-187 (394)
search:
Query Q_hras Match_columns 189 No_of_seqs 196 out of 1455 Neff 12.736 Searched_HMMs 52941 Date Mon Sep 28 11:28:02 2020 Command hhsearch -cpu 8 -i ../results/full.a3m -d /cluster/toolkit/production/databases/hh-suite/mmcif70/pdb70 -o ../results/hras.hhr -oa3m ../results/hras.a3m -p 20 -Z 250 -loc -z 1 -b 1 -B 250 -ssm 2 -sc 1 -seq 1 -dbstrlen 10000 -norealign -maxres 32000 -contxt /cluster/toolkit/production/bioprogs/tools/hh-suite-build-new/data/context_data.crf
No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM 1 2CE2_X GTPASE HRAS; SIGNALING 99.9 2.6E-22 4.8E-27 120.1 22.2 164 1-164 1-164 (166) 2 6MS9_A GTPase KRas; GTPASE KRA 99.9 4.3E-22 8.2E-27 119.3 23.1 166
1-166 1-166 (169) 3 5XCO_A B-cell lymphoma 6 prote 99.9 4.3E-22 8.1E-27 120.0 22.4 166 1-166 3-168 (171) 4 6MQT_H GTPase KRas; GTPASE KRA 99.9 7.1E-22 1.3E-26 118.3 23.0 165 1-165
2-166 (167) 5 3LVQ_E Arf-GAP with SH3 domain 99.9 2.1E-22 4E-27 141.0 20.7 170 1-174 320-493 (497) 6 6H47_A GTPase KRas, darpin K19 99.9 2.8E-21 5.3E-26 115.9 22.1 165 1-165 4-168 (169)