Hi! In need of some assistance.
Let A and B be two sets of sequences where all are, say 30bp long. So my problem is that not only do I want i) the intersection of A and B, i.e. all sequences which are both in A and also in B. Let's call the this set C. Additionally, I aslo require ii) FOR EACH found sequence S in C, I would also like all the sequences in (A OR B) which have a Levenshtein distance less than x from S.
Do you possible know of an efficient way to do this? Mind you that in my case, A and B are huge (>10 m). So far I've approached this problem by creating two tables in MYSQL (one for A and B) and doing some set operations there. However, the "fuzzy/mismatch" aspect of the problem makes it run very very very slow. I really would appreciate any directions.
Thanks in advance.