Question: Differences between parSeqSim and twoSeqSim results
0
2.2 years ago by
lefthandgergo10 wrote:

Hi!

I am trying to compare 100 peptide sequences to each other using the default settings of twoSeqSim and parSeqSim from the protr package (local alignment and BLOSUM62 substitution matrix). However, the results are different using the two functions. Using my CompareAll function, which executes twoSeqSim multiple times to compare all peptides in a vector, I've got integer scores in the similarity matrix. However, when I run parSeqSim on the same peptide set, it seems that it somehow normalizes the result values, since the results are between 0 and 1. How does this normalization work? Thanks!

``````# twoSeqSim
CompareAll <- function(eps) { # does pairwise comparisions for every peptides in the vector
simmtx <- matrix(nrow = length(pep),
ncol = length(pep),
dimnames = list(pep, pep))
for (i in 1:length(pep)) {
for (j in i:length(pep)) {
simmtx[i, j] <- twoSeqSim(pep[i], pep[j])@score
}
}
return(simmtx)
}

# parSeqSim
parSeqSim(peptides_tmp)
``````
protr sequence similarity R • 514 views
modified 2.2 years ago by h.mon31k • written 2.2 years ago by lefthandgergo10
2
2.2 years ago by
h.mon31k
Brazil
h.mon31k wrote:

The normalization performed by `parSeqSim()` is:

``````if ( is.numeric(s12) == FALSE |
is.numeric(s11) == FALSE |
is.numeric(s22) == FALSE ) {
sim = 0L
} else if ( abs(s11) < .Machine\$double.eps |
abs(s22) < .Machine\$double.eps ) {
sim = 0L
} else {
sim = s12/sqrt(s11 * s22)
``````

}

Where s11 is the score of sequence1 aligned to itself, s22 is the score of sequence2 aligned to itself, and s12 is the score of sequence1 aligned to sequence2.

This means if any score is non-numeric, or if either s11 or s22 are really, really small, then sequence similarity is set to zero; otherwise, sequence similarity is given by `s12 / sqrt (s11 * s22 )`.