Question

selecting the templates for homology modelling from modeller

0

Entering edit mode

5.9 years ago

virangihewage ▴ 10

Hi, I am new to modeller software and please give me your advice on the followings. I modeled a protein by using the modeller software and obtained the templates for the target and in order to pick up suitable templates out of those, E-value is needed to consider.If E-value is lesser then the better the template according to their publication.But the E-values are in negative and do I have consider the value with the negetive or do I have to consider just the value?. I am confused with that.

and when selecting the templates do I have to consider both E-value and the sequence alignment between template and target because some of the templates are above 40% sequence identity and some of them are lesser than it?. and again when selecting a template based on the dendogram to generate the models based on it, do I have to consider both resolution and the sequence identity?

Thank you.

alignment • 2.8k views

ADD COMMENT • link 5.9 years ago by virangihewage ▴ 10

0

Entering edit mode

It's been years since I used modeller but isn't the basic premise that you don't work with a template that's at ~40%? You need around 60-70% for template based alignment to work well, no?

Have you looked to see what negative e-values could mean? If they're all in a narrow range and not too negative, it could be an indication that they're all good e-values. From what I see, these e-values are sourced from protein-BLAST, so negative e values are a computational artifact, not really anything of significance.

ADD REPLY • link 5.9 years ago by Ram 43k

0

Entering edit mode

Thank you ram for the comment. tem1 0.12E-03 tem2 0.50E-02 tem3 0.86E-04 tem4 0.25E-02 tem5 0.33E-03 tem6 0.11E-02 tem7 0.96E-03 These are the E values that I obtained for the potential templates and the sequence identity of these are higher than 25% ( as according to the tutorial). They did not indicate what does E value means in anyplace that is why I asked. For their example in the modeller tutorial, some of the potential templates have negative values for the E-values and some have zero E-values.So they picked the zero E-values ( according to the modeller publication lesser the E value, better the template).So that means without considering the negativity I have to just consider the value right? According to the tutorial and the publication ,selection of the template is based on the 40% sequence identity.I am not sure about the 60-70% for template based alignment.

ADD REPLY • link 5.9 years ago by virangihewage ▴ 10

0

Entering edit mode

Please do not add answers unless you're answering the top level question. This belongs as a reply to my comment. Please see http://biostars.org/t/how-to posts to learn how to use the forum better. I've moved your post to a comment for now.

ADD REPLY • link 5.9 years ago by Ram 43k

0

Entering edit mode

sorry about that Ram.I made a mistake.

ADD REPLY • link 5.9 years ago by virangihewage ▴ 10

score 1 · Accepted Answer · 2018-06-18

1

Entering edit mode

5.9 years ago

Joe 21k

Several points:

The best E-value you can get is 0, therefore the lower your E values are, the closer they are to zero, and the better that template is. therefore you should be picking the templates with the largest negative exponent (basically).

The E-values won't be negative, just their exponents will. I.e. 1E-10 is not a negative number, its just much much smaller than 1.

If you don't know what templates to choose, don't use Modeller would be my advice. Use a pipeline like ITASSER which selects the best templates automatically.

Finally, if you aren't modelling, for example, the effect of a SNP on a particular protein, and don't have a very specific template in mind, it doesn't make any real sense to select your templates anyway. I wouldn't worry about the E-value or sequence alignment really. I'm not 100% sure how modellers algorithm works, but with ITASSER for example, the sequence is broken up in to very small stretches of sequence which have well defined secondary structures in other matches (i.e. it tries to find other proteins with similar short sequences, and it looks at whether they form helices, sheets, etc.). By doing this, you actually end up using multiple templates anyway often, so to reiterate - don't get hung up on your template too much.

ADD COMMENT • link 5.9 years ago by Joe 21k

0

Entering edit mode

Thank you for the solution. It really helps a lot.

But isn't better to consider both lower E-value and the sequence identity ( equal or greater than 40%) together to select the templates out of potential templates? In modeller tutorial DBali server is considered to get the multiple templates and actually which option that I need to get the cluster family for the template? Is it Get a precalculated MSA containing a given chain?. Because when I followed the example that in the modeller tutorial (selection of multiple templates) the cluster family and the templates provided are quiet different when I did same thing with DBali. Any idea???

Thank you

ADD REPLY • link 5.9 years ago by virangihewage ▴ 10

0

Entering edit mode

I'm not sure I really know what you mean.

In short, yes - obviously the closer your protein sequence is to a protein with a solved structure, the more likely your simulation is to be accurate, but threading algorithms are a little cleverer than simply taking the single best match and 'making it all fit'.

There is also a comparatively weak correlation between sequence identity and structure. Some proteins can look almost identical in terms of structure, whilst sharing <20% sequence identity.

There isn't a one-size-fits-all solution. You have to consider what you know about your protein already, and once you have a model, decide whether you think it looks reasonable. You also haven't told us what your protein is, whether there are good structures already, what your aim is and so on.

ADD REPLY • link 5.9 years ago by Joe 21k

0

Entering edit mode

The protein that I have to model is all trans 13,14 retinol reductase (retinol saturase) and there are 610 amino acids are there. I almost have completed modeling of it by using MODELLER based on single template but I think it's better to use multiple templates to model it rather than using single template.But the modeller tutorial is not very descriptive even thought I started to do it.According to the MODELLER I picked up the 1s3eA as the single template but in order to obtain multiple templates I have use DBali server to recognize the cluster family and the multiple templates. In the tutorial the cluster family that they have taken for their modelled protein is quiet different when I submitted respective data to the DBali server. That's the point that I am still trying to slove. .

ADD REPLY • link 5.9 years ago by virangihewage ▴ 10

0

Entering edit mode

My suggestion is to use Phyre2 and ITASSER as well, even if you get a model you like from MODELLER. It's always good to be able to compare them. Part of the reason I recommend ITASSER is that it is usually the best server in the CASP tests too.

Both of those tools require you just to submit a sequence and it does everything else for you. Worth doing anyway in my opinion.

ADD REPLY • link 5.9 years ago by Joe 21k

1

Entering edit mode

THANK you for your time

ADD REPLY • link 5.9 years ago by virangihewage ▴ 10

0

Entering edit mode

Hi, may I know what makes a model look reasonable? for my undergrad project,Im doing very basic modeling on gpr120,Ive made 7 models from different servers,(Itasser,Swiss-model,Modweb,M4T,Phyre2,Raptrox) highest seq identity 25% I got only from Swiss-model.and at the end I have to choose one of my models.Im looking for reason?how can I choose one of my model as a final model?

ADD REPLY • link 5.1 years ago by H.P • 0