I have the following sequence data
A_True: 45, 92, 134, 156, 199
A_Pred: 23, 44, 45, 46, 88, 156, 187, 188, 189, 210
These numbers represent the position in a sequence. The total length of sequence A is 230. A_True is actual positions (Ground truth). A_Pred is model predicted positions. So A_Pred is 10 model guess/predictions position out of 230 positions.
I would like to know if there is any statistical test that can evaluate whether set A_Pred hit/close to set A_True in terms of position (To check whether the position in set A_Pred is near to A_True or is it randomly picked)? And evaluate the following:
- If the ten numbers on A_Pred overlap those in A_True, it means a perfect match, and a penalty of 4 extra numbers (Sequence 2 can predict/guess more numbers than sequence 1, but should have some penalty for mismatch).
- 92(in A_True) and 88(in A_Pred) has a distance difference of 4
- 199(in A_True) has a distance of 10 to 189(in A_Pred) and 11 to 188&210(in A_Pred)
I have more list pairs B_True,B_Pred,C_True,C_Pred... Any statistical tests can serve this purpose?
My other thought is to use combination 230C10 and find the statistically significant that it contains the number in set A_True. But combination cannot represent whether the prediction is "near" (88 in A_True is near to 92 in A_True) or not, as it treats every value as distinct numbers. My problem concerns the position.