In R or Python, how do I implement an allele sequence "autocomplete" tool given a dictionary of possible allele sequence matches?
That is, given, say an allele sequence:
And given a limited number of dictionary values:
Allele 1: CCGATCGATCGTACGATCGGCAAGGTGA Allele 2: ACTATCTATCGTAAGATCGGCAACGTGG Allele 3: CCAATCGATCGTACGATCGGCAACGTGA Allele 4: TCTATCTATCGTAAGATCGGCAACGTGG
The program will return to me Allele 1 and Allele 3 as possible reconstructions of the original allele sequence. Since:
Allele 1: CCGATCGATCGTACGATCGGCAAGGTGA Allele 3: CCAATCGATCGTACGATCGGCAACGTGA
Ideally, the tool will search through the dictionary values as a tree search rather than exhaustively comparing the incomplete string with each dictionary entry. In other words, upon comparison of the first letter of alleles 2 and 3, they are automatically eliminated from consideration given that the first nucleotide does not match with the original string.