Finding a genetic sequence with no homology in a target genome
0
1
Entering edit mode
9.8 years ago
Ali ▴ 140

I want to design a probe, a short DNA sequence, with no homology in a target sequence. By homology I mean the designed sequence has no similarity in the target genome with a few mutations. The size of probe and number of mutations are considered as parameters.

A solution to this would be to generate a pool of random DNA sequences, and align them to the reference genome allowing mismatches, and looking for one that has no alignment.

Does anybody have a better solution?

alignment sequence • 2.3k views
ADD COMMENT
1
Entering edit mode

Could you instead just use one of the standard epitopes (e.g., an HA-tag)?

ADD REPLY
0
Entering edit mode

Great solution with biological insight! Thanks.

ADD REPLY
0
Entering edit mode

You can compute a statistics of n-gramm (k-mer) occurrence in the target sequence and design a probe using those k-mers which are not seen at all when using the edit distance (nullomers?). This way there will be no randomness and alignment involved - i.e. exact solution.

ADD REPLY
0
Entering edit mode

Thanks Pavel, I thought about the idea earlier, but I guessed even having k-mers with no occurrence it seems to be an NP-hard problem to find the farthest (or even far enough) sequence. More specifically we are given a set of k-mers, we want to find another k-mer which has edit distance at least x to any k-mer in the list. Am I true?

ADD REPLY
0
Entering edit mode

I guess that statistics computation has a linear complexity (the whole sequence length), moreover, you'll have a position-specific stat after that. Then you can use the dynamic programming which will yield an optimal solution in pseudo-polynomial (to the probe length) time. I might be wrong.

ADD REPLY

Login before adding your answer.

Traffic: 2567 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6