What Are Some Tools/Methods For Identifying Microhomology Between Short Sequences?
1
2
Entering edit mode
12.1 years ago
SES 8.6k

Hi,

I am investigating a specific type of recombination that involves microhomology (2-20 bp) between short direct repeats. My current method involves identifying all putative sites of recombination and using bl2seq to compare the short repeats. This is working fine, but BLAST has a word size limit of 4 for DNA comparisons, so I'm only able to identify putative events involving sequences >4 bp. Are there any methods for doing string comparison to identify this type of microhomology (2 or 3 bp of consecutive matching bases on 20 bp sequences). To simplify things, I will only be interested in the ends of the sequences, and I know which ends should match.

Here is an example:

ATCTAGTACGGATCGTACGTT
                  GTTATCTGAGCGAAAGCTAA

This is a comparison I'll be doing thousands of times, so I'd rather not be constructing an index or database for each comparison. I have coded this in Perl, so if it makes more sense to come up with a pure-Perl solution, I'm open to that.

Thanks.

recombination homology • 3.8k views
ADD COMMENT
1
Entering edit mode

Just because I'm curious... If it is not "top secret", why do you want to do that? (Can you really infer homology with a so short alignment region?)

ADD REPLY
0
Entering edit mode

No secret. It is actually a well-known mechanism I am studying called illegitimate recombination that has been detailed in many species including yeast and plants. The very name of the mechanism describes the fact that it involves very short regions of homology and operates outside of the normal recombinational machinery (i.e., those involving RecA). I'm just trying to better understand the process and not introduce any bias into the analyses.

ADD REPLY
0
Entering edit mode

note that the probability of two sequences having three nucleotides in common at their extremities is quite high. If you sum sequencing errors, the matches you are getting may just be random events.

ADD REPLY
0
Entering edit mode

The probability is not random if you are analyzing nonrandom sites in the genome, and as I said, I'm studying recombination events. If you are analyzing deletion events that are shared by multiple copies and know the evolutionary history, or origin, of certain events, it is not random at all.

ADD REPLY
2
Entering edit mode
12.1 years ago

Primer3 contains a standalone program named "ntdpal" that could fulfill your needs.

./ntdpal -m 0 -p  ATCTAGTACGGATCGTACGTT GTTATCTGAGCGAAAGCTAA g

ATCTAGTACGGATCGTACGTT                                                 
                  |||                                                 
                  GTTATCTGAGCGAAAGCTAA                                
______________________________________________________________________
|ATCTAGTACGGATCGTACGTT|  |GTTATCTGAGCGAAAGCTAA| g score=3.00 len=3 |18,0|19,1|20,2|
ADD COMMENT
0
Entering edit mode

This looks promising. I wish there was some documentation for the program, but the usage seems simple enough that I could start testing it.

ADD REPLY

Login before adding your answer.

Traffic: 1964 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6