Question: What are the recommended PARAMETERS for the local protein alignment and how important they are ?
As far as I understand BLOSSUM62 is typical choice for the substitution matrix.
Here in example below the params "opening gap penalty -4, extension penalty -1" are used - is it typical choice ? (For example chatGPT suggests 11,1 - used by NCBI blastp - that is true - see bottom page https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins and open "Algorithm parameters" ).
PS (From ChatGPT )
Question: what are the recommended parameters for the local protein alignment ?
The choice of parameters for local protein alignment depends on the specific algorithm or tool being used. However, I can provide you with some commonly recommended parameters for local protein alignment using the popular tool BLAST (Basic Local Alignment Search Tool) as an example:
Word Size: This parameter specifies the length of the word used as a basis for the initial seed match. The default value is typically 3, but increasing it to 4 or 5 can improve sensitivity at the cost of slower performance.
Expectation (E) Value: The E-value threshold determines the significance level at which alignments are reported. Lower E-values indicate more significant alignments. Commonly used values for local protein alignments range from 0.01 to 0.001, depending on the desired stringency.
Gap Costs: Gap costs determine the penalties assigned for gap opening and gap extension. Typically, there are two types of gap costs: a gap opening cost and a gap extension cost. The values for these parameters depend on the scoring system used and the characteristics of the sequences being aligned. Default values are commonly used, but they can be adjusted based on specific needs.
Scoring Matrix: The scoring matrix is used to assign scores to different residue substitutions during the alignment process. Popular choices include the BLOSUM (Blocks Substitution Matrix) and PAM (Point Accepted Mutation) matrices. The specific matrix and its version depend on the nature of the sequences being aligned.
Question: what are typical choices for the gap costs ?
The choice of gap costs for local protein alignment depends on the specific alignment algorithm or tool being used. However, two common types of gap costs are the gap opening cost and the gap extension cost.
In general, the gap opening cost is set higher than the gap extension cost because opening a gap is considered a more significant event than extending an existing gap. This reflects the biological expectation that introducing a gap requires more energy or evolutionary pressure.
For local protein alignment using tools like BLAST or BLASTP, the default gap opening and extension costs are often as follows:
Gap Opening Cost: 11 Gap Extension Cost: 1
These default values work well for many applications, but they can be adjusted based on specific requirements and the nature of the sequences being aligned. Increasing the gap opening cost relative to the extension cost makes the algorithm less likely to introduce long gaps, which can be useful when aligning closely related sequences or when looking for short conserved motifs. Conversely, decreasing the gap opening cost relative to the extension cost allows for longer gaps and can be helpful when aligning more distantly related sequences.
Some other experiments with alignments: