I'm trying to do a local NCBI BLAST search using the PSSM of a conserved CDD domain (.smp file in the database, cave LARGE file).
My database is containing nucleotide sequences and the PSSM is for proteins, so I would like to use tblastn for this search. The tblastn executable accepts PSSMs, so no problem there.
But: the matrix in the .smp file is scaled by a factor of 100; according to the specs, the BLAST program should be able to downscale the matrices back to a factor of 1 automatically.
If I'm running tblastn however, I get the following error:
BLAST engine error: PSSM has a scaling factor of 100. PSI-BLAST does not accept scaled PSSMs
Now, is there either a way to run tblastn with scaled matrices or a tool to scale them?
And if not, do I just scale the matrix values by 1/100 or the lambda, kappa and h factors (whatever these are) as well?
I am not aware of any tool that can do what you want to do. This is what I would do to eliminate the scaling:
Change the scaling factor from 100 to 1
Divide all weights in the PSSM by 100
Possibly multiply lambda by 100
Keep kappa and h unchanged
The part about the scaling factor and the weights should be self-explanatory (and you indeed suggested to do the same yourself).
The reason for possibly having to scale lambda is that it is used in the Karlin-Altschul formula: E = kappa*N*exp(-lambda*S). The big question is if BLAST corrects S for the scaling factor before plugging it into this formula. If it does, you should not change lambda. If it does not, you need to multiply lambda by 100 when dividing all the weights in the PSSM by 100. I sadly do not know the finer details of BLAST well enough to know which is the case.
The way kappa is used in the formula means that it should not be affected by the scaling factor. The parameter H is the entropy, which is used for calculating the effective length of query sequences and sequences in the database. The formula used is effective_length = max(length-lambda*S/H, 1). Consequently, H should also not be effected by the scaling (since lambda*S should be unchanged).