Checking for enrichment of certain kmers
1
0
Entering edit mode
4.5 years ago
ognjen011 ▴ 250

As a general problem, I frequently have some genomic locations, intervals or probes which are in some way selected from the remainder. For example, in ChipSeq I could have a number of intervals where peaks occurred. If I hypothesize that they all have a common motif, a pattern of some sort, is there an existing framework that would do this for me? Specifically, I would expect a counter and a statistical test of a pattern frequency in hits and misses.

With exact matches this is not difficult to implement, and I have done it, although speed might be an issue. The question is if one could allow mismatches, indels or regex-like structures to allow for a truly comprehensive search for sequence motifs.

So, provide two groups of sequences, count subsequences in both, and optionally perform statistical testing, ideally allowing complex matchings. Does this exist as a package or tool?

Thanks!

sequence • 670 views
ADD COMMENT
2
Entering edit mode
4.5 years ago
ATpoint 81k

Isn't that essentially a motif enrichment analysis as implemented in MEME or Homer where one first scans for de novo enrichments and then optionally scans these de novo motifs against a selection of known motifs?

ADD COMMENT
0
Entering edit mode

Essentially it is, and I didn't know that was a category or that those tools existed. Thank you sir, please post that as an answer.

ADD REPLY

Login before adding your answer.

Traffic: 2360 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6