Question: Calculating Genetic Distances Between Protein Sequences
1
gravatar for Sanju
8.9 years ago by
Sanju90
Sanju90 wrote:

Hi all,

How to calculate the pairwise genetic distances between protein sequences in R? I have already calculated pairwise sequence identity and stored in excel file. I imported this excel file in to R. But I couldn't generate a distance matrix based on this file. I used dist.alignment function but I got an error like this

"Object of class 'alignment' expected" ? Please help me to solve this problem

R • 9.3k views
ADD COMMENTlink written 8.9 years ago by Sanju90
1

Try editing your question to include the code you are using, example input, and also the output of sessionInfo(). Also, be sure to read the help for the dist.alignment function.

ADD REPLYlink written 8.9 years ago by Sean Davis26k
1
gravatar for Yogesh Pandit
8.9 years ago by
Yogesh Pandit500
United States
Yogesh Pandit500 wrote:

Get your sequence file in either of the following formats

mase, clustal, msf, phylip, fasta

Then you can read this file into an Object of class alignment

library(seqinr)
myseqs <- read.alignment("mySeq.fasta", format = "fasta")
mat <- dist.alignment(myseqs, matrix = "identity")

dist.alignment() will calculate "pairwise Distances from Aligned Protein or DNA/RNA Sequences". The output matrix (mat) will look like

       Langur    Baboon     Human       Rat       Cow
Baboon 0.3307189                                        
Human  0.3750000 0.3307189                              
Rat    0.5448624 0.5077524 0.5376453                    
Cow    0.4921255 0.5448624 0.5590170 0.6495191          
Horse  0.7071068 0.7071068 0.7015608 0.7015608 0.7342088
ADD COMMENTlink modified 13 months ago by RamRS27k • written 8.9 years ago by Yogesh Pandit500

just curious, are these distances likelihood-based estimates from certain protein evolution models?

ADD REPLYlink written 8.9 years ago by Vitis2.4k

@y2p actually my file is not an aligned file. It is a excel file. This file contains sequence identity data.for eg

    1       2       3       4       5       6       7
1                         
2   11.1                      
3   11.1    100.0                 
4   11.1    100.0   100.0             
5   18.2    11.1    11.1    11.1          
6   21.7    11.1    11.1    11.1    100.0     
7   22.2    11.1    11.1    11.1    44.4    44.4  
8   80.0    20.0    20.0    20.0    80.0    80.0    100.0

How to generate distance matrix from this excel file based on sequence identity data? Which function I have to use for this?

ADD REPLYlink modified 13 months ago by RamRS27k • written 8.9 years ago by Sanju90

@y2p. Actually my file is not an aligned file. It is an excel file which contains the percentage of sequence identity.My aim is to generate a distance matrix based on this data. That is distance= 100-sequence identity. which function I have to use for this?Do you have any idea?

ADD REPLYlink written 8.9 years ago by Sanju90

@y2p. I have 300 sheets in excel file. Sequence identity data is in 9th column. I imported all sequence identity values from 300 sheets in to R using this code.

library(gdata)
myfile<-NULL; 
for (i in 1:300) {
    myfile[[i]]<-read.xls("C://Users//Desktop//mydata.xls",sheet=i,head=F)[,9]
}
myfile

Next I have to apply distance formula and to create a matrix. The formula is distance = 100 - sequenceidentity.

Please help me

ADD REPLYlink modified 13 months ago by RamRS27k • written 8.9 years ago by Sanju90

@vitis these are just squared root pairwise distances from similarity/identity matrices

ADD REPLYlink written 8.9 years ago by Yogesh Pandit500
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1447 users visited in the last hour