Calculate Extent Of Sequence Similarity.
2
1
Entering edit mode
8.4 years ago
matray312 ▴ 10

I had a problem that I was wondering if it could be solved by one of the techniques/algorithms used in bioinformatics to give the extent of similarity. I am a Problem Statement: we have a sensor( its like a magnetic compass and has a dial with twelve equal zones- 30 degrees each) that every second outputs where its pointing. The typical random output of the sensor may look like (for example) 30 30 30 30 30 30 120 120 120 120 120 120 60 60 60 60 60 60 330 330 330 330 30 30 30 30 210 210 210 210 210 60 60 60 60 60 60 60 60 60 60 60 60 60 ……….etc. We wanted to see if we can calculate a measure of similarity of two 4-minute sequence samples taken at different times during the day . (It would be great if we could state something like - the sequences are similar and there is a 1 in million(say)chance that we may be wrong.)

sequence dna bioinformatics algorithm • 2.1k views
1
Entering edit mode
8.4 years ago
cts ★ 1.7k

Perhaps you could do a correlation analysis which would be easiest in R using the cor.test function. This will give you a p-value that tells you how will the two outputs are correlated

Example code shown below:

sensor_output_1 = c(30,30,30,30,30,30,120,120,120,120)
sensor_output_2 = c(30,45,30,30,120,120,100,100,120,120)
cor.test(sensor_output_1, sensor_output_2, alternative='greater')

Pearson's product-moment correlation

data:  sensor_output_1 and sensor_output_2
t = 2.0324, df = 8, p-value = 0.03829
alternative hypothesis: true correlation is greater than 0
95 percent confidence interval:
0.04607661 1.00000000
sample estimates:
cor
0.5835345

0
Entering edit mode
8.4 years ago
Sudeep ★ 1.7k

A naive approach would be to treat your 4 minute sensor read out as a long piece of string (text) and calculate similarity between these strings using some string similarity measures, but this is not (entirely) sequence similarity as you want.