Question

What Are Some Tools/Methods For Identifying Microhomology Between Short Sequences?

2

Entering edit mode

12.1 years ago

SES 8.6k

Hi,

I am investigating a specific type of recombination that involves microhomology (2-20 bp) between short direct repeats. My current method involves identifying all putative sites of recombination and using bl2seq to compare the short repeats. This is working fine, but BLAST has a word size limit of 4 for DNA comparisons, so I'm only able to identify putative events involving sequences >4 bp. Are there any methods for doing string comparison to identify this type of microhomology (2 or 3 bp of consecutive matching bases on 20 bp sequences). To simplify things, I will only be interested in the ends of the sequences, and I know which ends should match.

Here is an example:

ATCTAGTACGGATCGTACGTT
                  GTTATCTGAGCGAAAGCTAA

This is a comparison I'll be doing thousands of times, so I'd rather not be constructing an index or database for each comparison. I have coded this in Perl, so if it makes more sense to come up with a pure-Perl solution, I'm open to that.

Thanks.

recombination homology • 3.8k views

ADD COMMENT • link updated 12.1 years ago by Pierre Lindenbaum 161k • written 12.1 years ago by SES 8.6k

1

Entering edit mode

Just because I'm curious... If it is not "top secret", why do you want to do that? (Can you really infer homology with a so short alignment region?)

ADD REPLY • link 12.1 years ago by Manu Prestat 4.1k

0

Entering edit mode

No secret. It is actually a well-known mechanism I am studying called illegitimate recombination that has been detailed in many species including yeast and plants. The very name of the mechanism describes the fact that it involves very short regions of homology and operates outside of the normal recombinational machinery (i.e., those involving RecA). I'm just trying to better understand the process and not introduce any bias into the analyses.

ADD REPLY • link 12.1 years ago by SES 8.6k

0

Entering edit mode

note that the probability of two sequences having three nucleotides in common at their extremities is quite high. If you sum sequencing errors, the matches you are getting may just be random events.

ADD REPLY • link 12.1 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

The probability is not random if you are analyzing nonrandom sites in the genome, and as I said, I'm studying recombination events. If you are analyzing deletion events that are shared by multiple copies and know the evolutionary history, or origin, of certain events, it is not random at all.

ADD REPLY • link 12.1 years ago by SES 8.6k

score 2 · Answer 1 · 2012-03-05

2

Entering edit mode

12.1 years ago

Pierre Lindenbaum 161k

Primer3 contains a standalone program named "ntdpal" that could fulfill your needs.

./ntdpal -m 0 -p  ATCTAGTACGGATCGTACGTT GTTATCTGAGCGAAAGCTAA g

ATCTAGTACGGATCGTACGTT                                                 
                  |||                                                 
                  GTTATCTGAGCGAAAGCTAA                                
______________________________________________________________________
|ATCTAGTACGGATCGTACGTT|  |GTTATCTGAGCGAAAGCTAA| g score=3.00 len=3 |18,0|19,1|20,2|

ADD COMMENT • link 12.1 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

This looks promising. I wish there was some documentation for the program, but the usage seems simple enough that I could start testing it.

ADD REPLY • link 12.1 years ago by SES 8.6k