I would like to compute correlation coefficient between two numerical vectors that represent MALDI-TOF data. The two vectors do not always have the same length, which makes cor() function to fail.
The difference is usually of 1 or 2 values (out of 100 or 200). I would like to insert a NA into the shorter vector, but my problem is to find where I have to insert the NA value. The masses vectors are aligned, but sometimes, one mass in vector 1 match 2 masses in vector 2. By hand, I am able to "find" where the shift occurs, but I don't know how to code this search.
Here is an example :
mass1 is the first vector of masses. mass2 is the second vector.
I want to identify the intersection of the two vectors. However, we are dealing with biological data, so the masses are not exactly the same. We have to allow a delta to say that two close masses are equivalent. match1 is a boolean vector used to know which masses in vector 1 are found in vector 2. To compute this, i use a "window" so close masses are said to be equivalent. The window is coded in ppm (part per million), because the error in measurement is growing with the mass. So for a peak at mass 2000, I will allow a window of +/- 2, but for 20000, I will allow a window of +/-20.
The problem is that in a few circonstances, a mass in vector 1 is found to have a match in vector 2, but as the corresponding mass in vector 2 is lower, the window is smaller, and in the match2 vector, the value is FALSE. That's why at the end I have not the same length for match1 and match2 (but i should have). I have try to solve this problem, but to solve it exactly, it takes to much time to compute. That's why I wanted to just remove or add one value to have the same length for both vectors.
mass1 : ... 3711 3740 3818 3883 ... match1 : ... 0 1 1 1 ... mass2 : ... 3687 3747 3769 3817 3883 ... match2 : ... 0 0 0 1 1 ...
Here you can see that 3818 match 3817, 3883 match 3883, but in vector 1 3740 match 3747, but the opposite is not true. At the end, the vector match1 is longer than vector 2 by one unit. The error comes from here. I would like to align to vectors like the one below to "add" or "remove" one value and have the same length :
matched masses in vector 1 : ... 2114 3245 3740 3818 3883 4254 4785 ... matched masses in vector 2 : ... 2113 3247 3817 3883 4256 4785 ...
I'm sorry, it's hard to explain !
Would you know how to do this ? or if there is a way to compute corelation coeficient with two vectors of different length ?
Thanks a lot