why are local SMART results different from the web results
1
1
Entering edit mode
7.9 years ago
a.abnousi ▴ 30

I have downloaded the SMART hmm profiles and created a HMM DB using, then I have queried some sequences using this db and hmmpfam (from hmmer2). I was expecting that if I query the same sequence on SMART web service the results will be similar, but they are very different. Any hints on what I might be doing wrong?

I'm querying the U5WTH2_MYCKA for example. The sequence itself. The web interface gives me one transmembrane domain from position 20 to 42 and three low complexity regions (two of them overlapping). It also shows an EccE from Pfam.

I'm querying the same sequence locally and the lowest E-value is for RES_2 from positions 328 to 441 and this does not even overlap with any of the regions given by web interface. There are multiple other domains in this local query result as well which I understand some of them shouldn't be present in web result because of the threshold, but at least the highest ranking ones I expect to be the same.

SMART • 1.3k views
ADD COMMENT
0
Entering edit mode
7.9 years ago
a.abnousi ▴ 30

Well here is what I figured out, writing down for those coming back here later. First note that SMART "is not intended to be a comprehensive database of domains" (quoted from SMART's latest publication - 2015). It includes about 1200 domains and that means for many of the sequences it won't be able to find a match. One such sequence is U5WTH2_MYCKA. Second, the local SMART does not show the transmembrane or low-complexity regions. The web results for the example sequence here only included such regions. Third, the result of running HMMER2's hmmpfam on your sequence locally is usually a long list of "matches", however some of these matches have a large e-value and should not be considered a domain mostly. If you want to imitate the web-based SMART or you have no clue what a good e-value for a domain should be then you need to check each of the returned matches e-value with the threshold e-value that of that domain that comes with your SMART download (you either will need to do the check manually or write a script for that). Finally the web-based SMART does not show the overlapping domains, to the best of my understanding, from two domains that overlap in sequence, the one with lower e-value will be returned, however this last peace needs to be verifies, I am not sure about this.

ADD COMMENT

Login before adding your answer.

Traffic: 2675 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6