Question: Normalized Smith-Waterman similarity score based on pairwise amino acids sequence alignments
4.4 years ago by
'260 wrote:

I have 100 protein sequences and I wish to compute similarities between them. What's the most efficient way to get the normalized Smith-Waterman similarity scores?

ADD COMMENTlink modified 2.2 years ago • written 4.4 years ago by '

Given the number of pre-existing alignment tools, the most efficient method would be to not write anything and just use someone elses (likely online and easily findable via google) tool.

I ended up using R which was reasonably fast for my limited computational resources. I used Biostrings::pairwiseAlignment.

is there any answer?

2.2 years ago by
'260 wrote:

Here's a naive approach:


## Load protein sequences
#Protein_Seq <- read.fasta("Protein_sequneces.fa", seqtype="AA", as.string="TRUE")
#Protein_Sequences_Only <- getSequence(Protein_Seq, as.string=TRUE)
protein_dat <- read.table("Protein_sequneces.dat")

Smith_Waterman_Scores <- data.frame(Seq1=as.numeric(),

## Perform alignment
for (i in 1:length(protein_dat)) {
  for (j in 1:length(protein_dat))

    t <- pairwiseAlignment(protein_dat[i,], protein_dat[j,],
    Smith_Waterman_Scores <- rbind(Smith_Waterman_Scores, c(i,j,t@score))


names(Smith_Waterman_Scores)[1] <- "First.Protein"
names(Smith_Waterman_Scores)[2] <- "Second.Protein"
names(Smith_Waterman_Scores)[3] <- "Score"

### Normalize Smith-Waterman similarity scores
dt <- data.table(Smith_Waterman_Scores)
dt.lookup <- dt[First.Protein == Second.Protein]
setkey(dt,"First.Protein" )
setkey(dt.lookup,"First.Protein" )
colnames(dt.lookup) <- c("First.Protein","Second.Protein","Score1")
dt <- dt[dt.lookup]
setkey(dt,"Second.Protein" )
colnames(dt.lookup) <- c("First.Protein","Second.Protein","Score2")
dt <- dt[dt.lookup][
  , Normalized :=  Score / (sqrt(Score1) * sqrt(Score2))][
    , .(First.Protein, Second.Protein, Normalized)]
dt <- dt[order(dt$First.Protein),]

Smith_Waterman_Scores <-
hi, I am confused about the dt.lookup object. It seems that dt.lookup <- dt[First.Protein == Second.Protein] is the wrong code. Would you please fix it. Thank you

