Question: How do I train my sequence to have higher score in using HMMER package
0
gravatar for howenwy2
4 months ago by
howenwy20
howenwy20 wrote:

Hi, I am recently using a package named HMMER to predict the binding site of the sequence. Here is my input data:

CLUSTAL O(1.2.4) multiple sequence alignment

gene1 TTTGAGTGTGTTA 13

gene2 TTTGATCTGGTTA 13

gene3 ATTGAGGTAGTTA 13

gene4 TTTGAGGCTATTG 13

I need to find the (T/A)TTGANNNNNTT(G/A) in my genome sequence, and gene1 to gene4 is also the sequence from the same genome, and I need to find that sequence from the other genes in this genome.

Now I can find about 38 binding site in my target genome; however, the score is low. I hope there is a way to increase my score.

ADD COMMENTlink written 4 months ago by howenwy20
2

Why?

The score reflects how similar those sequences are. They cannot be made ‘more similar’. What do you want to gain from higher scores?

ADD REPLYlink written 4 months ago by jrj.healey13k

Thank you so much for replying. Can I trust the score lower than 10 (or lower than 5)? I got several genes with lower score, but those genes contain the specific sequence that I need.

ADD REPLYlink written 4 months ago by howenwy20
1

as jrj.healey said: the score is what it is. can't change that. And seeing your input data this is far from un-expected (that is a very broad "motif" you are looking for) also given the fact that you are screening a whole genome with it.

The only thing you can do is to give better input data (== more specific) .

ADD REPLYlink written 4 months ago by lieven.sterck5.6k

Thank you so much for replying. Can I trust the score lower than 10 (or lower than 5)? I got several genes with lower score, but those genes contain the specific sequence that I need.

ADD REPLYlink written 4 months ago by howenwy20

Gut feeling I would say no (and I personally wouldn't either), but do check the HMMer docs to see if there is any advice on the score interpretation

Alternatively you could check how Interpro (interproscan) deals with this. they use kinda empirically determined threshold to decide between match and no-match

ADD REPLYlink written 4 months ago by lieven.sterck5.6k

You’d have to look at how the score is defined in the docs, I don’t know this off the top of my head, and pick a reasonable sounding number.

Binding sites are notoriously ‘wonky’ and hard to predict, so you would be justified in considering lower-than-typical scores perhaps.

If the score is anything like an E-value, a score of 5 would imply that at least 5 other matches would arise by pure chance alone, so thats probably too high.

Don’t make the mistake of only using the numbers though. If you detect a match, and its genomic context looks valid (its in the right place adjacent to a gene etc), then there’s grounds to proceed. In, short, throw some intuition at the problem, don’t just take those scores blindly.

In answering your original question, you may be able to improve your HMM scores, by using a HMM built from more known examples of the binding site - this will allow for a more ‘informed’ HMM and may help to narrow down your hits.

Alternatively, there isn’t strictly any need to use HMMs for this at all. Since your binding site is pretty well defined, you could just use fuzzy nucleotide matching, e.g. via EMBOSS’s fuzznuc.

ADD REPLYlink written 4 months ago by jrj.healey13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 619 users visited in the last hour