Question: Minimize Number Of Motifs Describing A Peptide Set
gravatar for julieN
9.1 years ago by
Cambridge, MA USA
julieN250 wrote:

I have a collection of peptide sequences that were pulled down by polyclonal antibodies generated against a (longish) specific antigen. They have been hand-curated to segregate the sequences into expected positives and expected negatives. As I have found to be typical for this type of experiment, it looks like different antibodies within the polyclonals are recognizing different parts of the antigen.

We've defined about 20 submotifs from the antigen that will cover most of the expected positives with low/no occurence in the expected negative set, but there is significant overlap among them. My goal is to cover the largest number of expected positives with the fewest motifs. Literature and Google searches on protein motif/antigen/computation pull up tons of papers, but they are mostly on motif discovery. My question is whether this is a known class of problem, and, if so, what is it called?

motif • 1.5k views
ADD COMMENTlink modified 6.4 years ago by karlpersius.manlimos0 • written 9.1 years ago by julieN250
gravatar for Qdjm
9.1 years ago by
Qdjm1.9k wrote:

Not sure what the general class of problem would be, sounds a bit like a rule learning problem.

However, you try could solve it using LASSO-regularized logistic regression via glmnet.R.

  • Make each motif an input feature (i.e. independent variables), and the value of that feature (i.e. x_i) for a particular sequence is the number of occurrences of the motif in the sequence.
  • Make the target values 1 for a positive sequence and 0 for a negative.
  • The resulting non-zero regression coefficients returned by glmnet are the non-redundant set.

glmnet gives you the entire regularization path which means that for each motif set size, it will provide a guess at the best non-redundant set.

ADD COMMENTlink written 9.1 years ago by Qdjm1.9k

Thank you @qdjm! This is what I was looking for, and would never have found otherwise. I'm still not sure of the name of the problem ("subset selection" maybe), but I've been able to pull out similar literature, and that's what I'm after.

ADD REPLYlink written 9.1 years ago by julieN250
gravatar for Niallhaslam
9.1 years ago by
Niallhaslam2.3k wrote:

You could try Comparimotif - this will allow you to compare the two sets of motifs (I'm guessing you've defined them as regular expressions?). If you search your query motif against your database of motifs (i.e. all your motifs) it will compute the overlaps, matches, degeneracy, variants, parents etc.

I've no idea what the overall class of problem is called though.

ADD COMMENTlink written 9.1 years ago by Niallhaslam2.3k

Thanks @niallhaslam--this is a nicer implementation of what I have produced with my own scripts, and I will use it in the future.

ADD REPLYlink written 9.1 years ago by julieN250
gravatar for karlpersius.manlimos
6.4 years ago by
karlpersius.manlimos0 wrote:

is your data available? i really need this for my thesis. help me. Thank You

ADD COMMENTlink written 6.4 years ago by karlpersius.manlimos0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1608 users visited in the last hour