Don't get hits on HMMER
1
0
Entering edit mode
10.5 years ago
uguy • 0

Hello everyone,

I'm using HMMER in order to compare an alignment of all the 3' slicing sequences of a certain genome against an another very close genome. First I made a profile with hmmbuild and then I used nhmmer and nhmmscan, in the two cases, I didn't get matches but I'm sure that there is some homologies.

Internal pipeline statistics summary:
-------------------------------------
Query model(s):                              1  (23 nodes)
Target sequences:                            8  (18238624 residues searched)
Residues passing SSV filter:              6989  (0.000383); expected (0.02)
Residues passing bias filter:             6989  (0.000383); expected (0.02)
Residues passing Vit filter:                 0  (0); expected (0.003)
Residues passing Fwd filter:                 0  (0); expected (3e-05)
Total number of hits:                        0  (0)

I tried to modify the thresholds of the differents filters without success:

Internal pipeline statistics summary:
-------------------------------------
Query model(s):                              1  (23 nodes)
Target sequences:                            8  (18238624 residues searched)
Residues passing SSV filter:             20521  (0.00113); expected (0.02)
Residues passing bias filter:            20521  (0.00113); expected (0.02)
Residues passing Vit filter:               256  (1.4e-05); expected (0.005)
Residues passing Fwd filter:                96  (5.26e-06); expected (0.02)
Total number of hits:                        0  (0)

Is there something I forgot to do? Thanks in advance!

HMMER • 2.0k views
ADD COMMENT
1
Entering edit mode
10.5 years ago

I don't know what you mean by 3' slicing sequences (typo for the 3' splice site consensus?) but:

HMMER is intended for identifying statistically significant sequence relationships, as a proxy for sequence homology -- i.e. comparison scores that you wouldn't expect to see by chance in a target sequence database of whatever size yours is. But many DNA sequence motifs (and some protein sequence motifs) occur at close to their expected random occurrence frequencies. Such sequences might arise by homology, but I bet that they usually arise de novo, because they can.

To give a simple example, HMMER is not the right tool to identify EcoRI restriction sites (GAATTC) because they occur frequently in random sequence, and generally aren't statistically significant when they occur in genomes.

If you're talking about 3' splice site consensus, same deal, in spades: you'll find YYYYAG all over the place by chance, so HMMER isn't going to call any of them statistically significant.

If you're looking to enumerate all matches to simple patterns like GAATTC or YYYYAG, you're better off with a pattern matching program, not a program that's looking for statistically significant comparisons.

ADD COMMENT

Login before adding your answer.

Traffic: 2627 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6