Find All K-Mers In One Set Of Sequences Not In The Other Set
2
2
Entering edit mode
12.2 years ago
David M ▴ 580

I'd like to identify all kmers in one set of transcriptomic data that are not in the other set. I am dealing with large amounts of data, but it seems to me that sequence assembly regularly performs this task, and that it could be easily accomplished with suffix trees. The k I am thinking of using is 32.

Are there any programs which can accomplish this for me? I'd rather not re-invent the wheel. I'd even settle for a program which can give me a list of k-mers present in a single data set.

Cheers!

assembly comparison • 3.0k views
ADD COMMENT
4
Entering edit mode
12.2 years ago

I like using [?]jellyfish[?] for counting kmers. It's pretty fast.

ADD COMMENT
0
Entering edit mode

This worked perfectly, and pretty fast as well. Thanks!

ADD REPLY
1
Entering edit mode
12.2 years ago
SES 8.6k

I would start with looking at Tallymer, which is a part of genometools. Tallymer will allow you to create a persistent index from one set and compare the occurrence ratio of k-mers from another set. I do a lot of k-mer work, and this is a great program that is well-documented.

Vmatch is a really powerful and versatile tool for any sequence comparison task. While I have never used it, the Unwords software seems like it might be the right tool for this job. These two software packages are related (same author) though, based on the Unwords publication, the data structures utilized by Unwords are completely different from Vmatch.

To add one more, I occasionally use meryl, which is really fast. This one has almost no documentation and I always have to read the C code to figure out the invocation, that is why I listed it last (not to say it's inferior, but you'll get going a lot faster with the other tools).

ADD COMMENT
0
Entering edit mode

I found that Jellyfish (above) worked really well for my purposes, but if I have to perform similar work in the future I'll be sure to look at these tools as well.

ADD REPLY

Login before adding your answer.

Traffic: 3861 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6