Question: Identifying common features between RNA strings
gravatar for dh486
2.8 years ago by
dh4860 wrote:

I am a mathematical modeller with very limited bioinformatics experience so please forgive me if my question is entirely obvious and multiple accepted algorithms exist to answer my question.

I am in a position whereby I have a set, A, of about 250 sequences of RNA all 100 bases in length which are associated with dysfunctional splicing under the action of a certain pathway inhibitor (details unimportant here). As a complimentary data set, B, I also have ~10,000 more sequences of RNA, of the same length which have no observed dysfunctional splicing under the same inhibitor.

What I would like to run is some form of analysis to identify either structural properties similar to members of A and not present within B or 'motifs' which are over-represented within either of the sets.

I can think of general methods of approaching this, working out what the general distribution of different lengths of motifs should be within a random sample and comparing it to my data or even learning and applying some machine learning algorithm to the data set which could hopefully identify shared patterns between the sets. However, given how new I am to this area of bioinformatics I wouldn't want to waste energy when pattern recognition algorithms are already rigorously established and accepted by the bioinformatics community which could be tweaked to my problem.

Alternatively, could anyone suggest a good resource for learning any pattern recognition or machine learning techniques? I am proficient in python, c++, Matlab etc. so would really just be happy to hear of any algorithm techniques rather than specific programming advice (although if anyone is aware of any useful libraries out there...)

rna-seq next-gen • 517 views
ADD COMMENTlink written 2.8 years ago by dh4860

There are many tools for motif analysis (MEME) for example. You may need to come up with a framework to unify both structural similarities (by fold structure) and also by sequence similarity. These are not always linked very closely.

ADD REPLYlink written 2.8 years ago by Joe18k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1898 users visited in the last hour