Question

Don't get hits on HMMER

0

Entering edit mode

9.0 years ago

uguy • 0

Hello everyone,

I'm using HMMER in order to compare an alignment of all the 3' slicing sequences of a certain genome against an another very close genome. First I made a profile with hmmbuild and then I used nhmmer and nhmmscan, in the two cases, I didn't get matches but I'm sure that there is some homologies.

Internal pipeline statistics summary:
-------------------------------------
Query model(s):                              1  (23 nodes)
Target sequences:                            8  (18238624 residues searched)
Residues passing SSV filter:              6989  (0.000383); expected (0.02)
Residues passing bias filter:             6989  (0.000383); expected (0.02)
Residues passing Vit filter:                 0  (0); expected (0.003)
Residues passing Fwd filter:                 0  (0); expected (3e-05)
Total number of hits:                        0  (0)

I tried to modify the thresholds of the differents filters without success:

Internal pipeline statistics summary:
-------------------------------------
Query model(s):                              1  (23 nodes)
Target sequences:                            8  (18238624 residues searched)
Residues passing SSV filter:             20521  (0.00113); expected (0.02)
Residues passing bias filter:            20521  (0.00113); expected (0.02)
Residues passing Vit filter:               256  (1.4e-05); expected (0.005)
Residues passing Fwd filter:                96  (5.26e-06); expected (0.02)
Total number of hits:                        0  (0)

Is there something I forgot to do? Thanks in advance!

HMMER • 1.7k views

ADD COMMENT • link updated 21 months ago by Ram 43k • written 9.0 years ago by uguy • 0

score 1 · Answer 1 · 2015-04-23

I don't know what you mean by 3' slicing sequences (typo for the 3' splice site consensus?) but:

HMMER is intended for identifying statistically significant sequence relationships, as a proxy for sequence homology -- i.e. comparison scores that you wouldn't expect to see by chance in a target sequence database of whatever size yours is. But many DNA sequence motifs (and some protein sequence motifs) occur at close to their expected random occurrence frequencies. Such sequences might arise by homology, but I bet that they usually arise de novo, because they can.

To give a simple example, HMMER is not the right tool to identify EcoRI restriction sites (GAATTC) because they occur frequently in random sequence, and generally aren't statistically significant when they occur in genomes.

If you're talking about 3' splice site consensus, same deal, in spades: you'll find YYYYAG all over the place by chance, so HMMER isn't going to call any of them statistically significant.

If you're looking to enumerate all matches to simple patterns like GAATTC or YYYYAG, you're better off with a pattern matching program, not a program that's looking for statistically significant comparisons.