Identifying common features between RNA strings
0
0
Entering edit mode
6.4 years ago
dh486 • 0

I am a mathematical modeller with very limited bioinformatics experience so please forgive me if my question is entirely obvious and multiple accepted algorithms exist to answer my question.

I am in a position whereby I have a set, A, of about 250 sequences of RNA all 100 bases in length which are associated with dysfunctional splicing under the action of a certain pathway inhibitor (details unimportant here). As a complimentary data set, B, I also have ~10,000 more sequences of RNA, of the same length which have no observed dysfunctional splicing under the same inhibitor.

What I would like to run is some form of analysis to identify either structural properties similar to members of A and not present within B or 'motifs' which are over-represented within either of the sets.

I can think of general methods of approaching this, working out what the general distribution of different lengths of motifs should be within a random sample and comparing it to my data or even learning and applying some machine learning algorithm to the data set which could hopefully identify shared patterns between the sets. However, given how new I am to this area of bioinformatics I wouldn't want to waste energy when pattern recognition algorithms are already rigorously established and accepted by the bioinformatics community which could be tweaked to my problem.

Alternatively, could anyone suggest a good resource for learning any pattern recognition or machine learning techniques? I am proficient in python, c++, Matlab etc. so would really just be happy to hear of any algorithm techniques rather than specific programming advice (although if anyone is aware of any useful libraries out there...)

RNA-Seq next-gen • 822 views
ADD COMMENT
0
Entering edit mode

There are many tools for motif analysis (MEME) for example. You may need to come up with a framework to unify both structural similarities (by fold structure) and also by sequence similarity. These are not always linked very closely.

ADD REPLY

Login before adding your answer.

Traffic: 2044 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6