Question: Normalized Smith-Waterman similarity score based on pairwise amino acids sequence alignments
0
gravatar for  '
3.1 years ago by
'170
'170 wrote:

I have 100 protein sequences and I wish to compute similarities between them. What's the most efficient way to get the normalized Smith-Waterman similarity scores?

alignment • 1.9k views
ADD COMMENTlink modified 9 months ago • written 3.1 years ago by '170
1

Given the number of pre-existing alignment tools, the most efficient method would be to not write anything and just use someone elses (likely online and easily findable via google) tool.

ADD REPLYlink written 3.1 years ago by Devon Ryan90k

I ended up using R which was reasonably fast for my limited computational resources. I used Biostrings::pairwiseAlignment.

ADD REPLYlink written 16 months ago by '170

is there any answer?

ADD REPLYlink written 9 months ago by y26075704190
0
gravatar for  '
9 months ago by
'170
'170 wrote:

Here's a naive approach:

library("Biostrings")
library("seqinr")    


## Load protein sequences
#Protein_Seq <- read.fasta("Protein_sequneces.fa", seqtype="AA", as.string="TRUE")
#Protein_Sequences_Only <- getSequence(Protein_Seq, as.string=TRUE)
data("BLOSUM62")
protein_dat <- read.table("Protein_sequneces.dat")

Smith_Waterman_Scores <- data.frame(Seq1=as.numeric(),
                                    Seq2=as.numeric(), 
                                    Score=as.numeric(), 
                                    stringsAsFactors=FALSE) 

## Perform alignment
for (i in 1:length(protein_dat)) {
  for (j in 1:length(protein_dat))
  {

    t <- pairwiseAlignment(protein_dat[i,], protein_dat[j,],
                                         substitutionMatrix=BLOSUM62,type="local")
    Smith_Waterman_Scores <- rbind(Smith_Waterman_Scores, c(i,j,t@score))

  }
}

names(Smith_Waterman_Scores)[1] <- "First.Protein"
names(Smith_Waterman_Scores)[2] <- "Second.Protein"
names(Smith_Waterman_Scores)[3] <- "Score"



### Normalize Smith-Waterman similarity scores
dt <- data.table(Smith_Waterman_Scores)
dt.lookup <- dt[First.Protein == Second.Protein]
setkey(dt,"First.Protein" )
setkey(dt.lookup,"First.Protein" )
colnames(dt.lookup) <- c("First.Protein","Second.Protein","Score1")
dt <- dt[dt.lookup]
setkey(dt,"Second.Protein" )
setkey(dt.lookup,"Second.Protein")
colnames(dt.lookup) <- c("First.Protein","Second.Protein","Score2")
dt <- dt[dt.lookup][
  , Normalized :=  Score / (sqrt(Score1) * sqrt(Score2))][
    , .(First.Protein, Second.Protein, Normalized)]
dt <- dt[order(dt$First.Protein),]

Smith_Waterman_Scores <- as.data.frame(dt)
ADD COMMENTlink modified 9 months ago • written 9 months ago by '170

hi, I am confused about the dt.lookup object. It seems that dt.lookup <- dt[First.Protein == Second.Protein] is the wrong code. Would you please fix it. Thank you

ADD REPLYlink written 27 days ago by colinwxlabc0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1089 users visited in the last hour