Question: kmer alignment with mismatch
3
gravatar for Vince
3.1 years ago by
Vince120
Montreal, Quebec, Canada
Vince120 wrote:

Hi,

I have a list of kmers, between 8-12 nt in length, and I would like to align these to a larger sequence returning all ungapped matches with at most 2 mismatches. I would like search to be exhaustive i.e. I do not want to miss anything. I wrote a python script to compute hamming distance for all substrings of my reference to the query, but it is too slow for many (1000s) queries on a reference of ~100,000nt.

What program would you recommend that does this and runs rather quickly. I have looked into Bowtie2, but I am unsure if it was designed to work with such short query sequences.

Thanks for the feedback.

mismatch alignmnt kmer • 2.0k views
ADD COMMENTlink modified 3.1 years ago by dariober9.9k • written 3.1 years ago by Vince120
3
gravatar for Vince
3.1 years ago by
Vince120
Montreal, Quebec, Canada
Vince120 wrote:

Ended up creating a custom python script that:

1. Broke up the larger sequence into kmers of required sizes into a set.

2. For each query kmer, compute possible all 2-mismatch kmers into a set.

3. interesect set from 1 with set from 2.

Works extremely quickly as my kmers are quite small (8-12 nt) and the search target is also relatively small (tens of kb).

 

 

ADD COMMENTlink written 3.1 years ago by Vince120
1

Your solution sounds like the best approach.  Incidentally, you can generate mutant kmers with BBDuk, like this:

bbduk.sh ref=sequences.fasta dump=kmers.fasta k=12 hdist=2 mm=f

The number of mutations is specified as hdist (hamming distance).

ADD REPLYlink written 3.1 years ago by Brian Bushnell16k
2
gravatar for dariober
3.1 years ago by
dariober9.9k
WCIP | Glasgow | UK
dariober9.9k wrote:

Good you found a solution. For this sort of things vmatch is a useful program to know about. It has extensive documentation and it seems to me it's really well written and maintained. You need to ask for a license key (free for academic use).

In your case the command would probably be:

vmatch -v -e 2 -d -p -showdesc 0 -complete -q query.fa reference.mkvtree

with reference.mkvtree being the indexed reference.

ADD COMMENTlink written 3.1 years ago by dariober9.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 847 users visited in the last hour