Question: Normalized Smith-Waterman similarity score based on pairwise amino acids sequence alignments
0
gravatar for  '
4.4 years ago by
'260
'260 wrote:

I have 100 protein sequences and I wish to compute similarities between them. What's the most efficient way to get the normalized Smith-Waterman similarity scores?

alignment • 2.7k views
ADD COMMENTlink modified 2.2 years ago • written 4.4 years ago by '260
1

Given the number of pre-existing alignment tools, the most efficient method would be to not write anything and just use someone elses (likely online and easily findable via google) tool.

ADD REPLYlink written 4.4 years ago by Devon Ryan96k

I ended up using R which was reasonably fast for my limited computational resources. I used Biostrings::pairwiseAlignment.

ADD REPLYlink written 2.7 years ago by '260

is there any answer?

ADD REPLYlink written 2.2 years ago by y26075704190
0
gravatar for  '
2.2 years ago by
'260
'260 wrote:

Here's a naive approach:

library("Biostrings")
library("seqinr")    


## Load protein sequences
#Protein_Seq <- read.fasta("Protein_sequneces.fa", seqtype="AA", as.string="TRUE")
#Protein_Sequences_Only <- getSequence(Protein_Seq, as.string=TRUE)
data("BLOSUM62")
protein_dat <- read.table("Protein_sequneces.dat")

Smith_Waterman_Scores <- data.frame(Seq1=as.numeric(),
                                    Seq2=as.numeric(), 
                                    Score=as.numeric(), 
                                    stringsAsFactors=FALSE) 

## Perform alignment
for (i in 1:length(protein_dat)) {
  for (j in 1:length(protein_dat))
  {

    t <- pairwiseAlignment(protein_dat[i,], protein_dat[j,],
                                         substitutionMatrix=BLOSUM62,type="local")
    Smith_Waterman_Scores <- rbind(Smith_Waterman_Scores, c(i,j,t@score))

  }
}

names(Smith_Waterman_Scores)[1] <- "First.Protein"
names(Smith_Waterman_Scores)[2] <- "Second.Protein"
names(Smith_Waterman_Scores)[3] <- "Score"



### Normalize Smith-Waterman similarity scores
dt <- data.table(Smith_Waterman_Scores)
dt.lookup <- dt[First.Protein == Second.Protein]
setkey(dt,"First.Protein" )
setkey(dt.lookup,"First.Protein" )
colnames(dt.lookup) <- c("First.Protein","Second.Protein","Score1")
dt <- dt[dt.lookup]
setkey(dt,"Second.Protein" )
setkey(dt.lookup,"Second.Protein")
colnames(dt.lookup) <- c("First.Protein","Second.Protein","Score2")
dt <- dt[dt.lookup][
  , Normalized :=  Score / (sqrt(Score1) * sqrt(Score2))][
    , .(First.Protein, Second.Protein, Normalized)]
dt <- dt[order(dt$First.Protein),]

Smith_Waterman_Scores <- as.data.frame(dt)
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by '260

hi, I am confused about the dt.lookup object. It seems that dt.lookup <- dt[First.Protein == Second.Protein] is the wrong code. Would you please fix it. Thank you

ADD REPLYlink written 17 months ago by colinwxlabc0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 834 users visited in the last hour