Question: Databases To Analyse Dna Characteristics/ Patterns
1
gravatar for PoGibas
8.3 years ago by
PoGibas4.8k
Vilnius
PoGibas4.8k wrote:

I have collection of specific breakpoints and their flanking sequences

<------150bp-----/breakpoint/-----150bp----->

I want to find out if their is something in common in between those sequences (trying to compare & find some similarity).

I am interested in any database to decipher sequence patterns, characteristics (motifs, repeats, physical characteristics etc.)

I have already tried :

  • MEME motif search;

  • Vienna package for secondary structures;

  • Repeat masker;
  • Blast2 - for similarity around the breakpoint;
  • EMBOSS tools -UCSC tables for annotated info about repeats, histones, Dnase sites etc.

I am looking forward for any suggestions: TF binding, more secondary structures, more motifs, repeats and specific elements. Especialy protein (chromatin remodeling), nucleases , Ig target sites!

All suggestions are welcome - I am going to try them all.

motif dna • 1.8k views
ADD COMMENTlink modified 8.3 years ago by Steve Lianoglou5.0k • written 8.3 years ago by PoGibas4.8k

I used Vienna Package to get the energetic parameters of my DNA - secondary structures are cool (maybe way too good), but Vienna is always giving some kind of a structure and what I am interested more is a way to measure all possible DNA mechanical characteristics (flexing, bending etc.) Hope someone could help me with and easy way of doing it as Vienna was too "fancy".

ADD REPLYlink modified 8.3 years ago • written 8.3 years ago by PoGibas4.8k
2
gravatar for Steve Lianoglou
8.3 years ago by
Steve Lianoglou5.0k
US
Steve Lianoglou5.0k wrote:

Assuming you have enough examples, how about trying to set this up a classification problem?

If you're hunting for motifs, you can iterate all x- to y-mers in the flanking upstream and downstream features separately (where x and y are the min and max size of kmers you are looking for). These represent the features of your examples. (If you think other features are important, toss them in too).

Your breakpoint regions are your positive set. Pick an equally sized (or larger) set of breakpoints at random (or ones that look like your breakpoint in some way you think these should look but have no breakpoint) and this will be your negative set.

Run your data through some binary classifer (SVM, penalized logistic regression, boosting, etc.) doing appropriate cross validation and see if you can get good prediction accuracy. This will take some time to get right (assuming you can do so).

Once you can build a strong classifier, see if you can interrogate it to see which features are relevant.

ADD COMMENTlink written 8.3 years ago by Steve Lianoglou5.0k
1
gravatar for Philipp Bayer
8.3 years ago by
Philipp Bayer6.7k
Australia/Perth/UWA
Philipp Bayer6.7k wrote:

Have you had a look at protein-domain-analysis using HMMER?

You could transform your sequences into all 6 possible reading frames, then translate to amino acids and then check for protein domains. Domains are based on Hidden Markov Models so you might get different results than what you've already tried.

ADD COMMENTlink written 8.3 years ago by Philipp Bayer6.7k
2

I'm not sure he specified that these were breakpoints in coding regions(?)

ADD REPLYlink written 8.3 years ago by Steve Lianoglou5.0k

True! This should only work on coding sequences.

ADD REPLYlink written 8.3 years ago by Philipp Bayer6.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1682 users visited in the last hour