Question

Bioinformatics basics-Protein sequence and its interpretation

0

Entering edit mode

4.0 years ago

sherlokhomes.sharma • 0

Hi ,I have a few questions/ 1) How to identify motifs in a protein sequence. Yes I know, if I paste the sequence on a Bioinformatics software, it will give me a result, but how to interpret it? 2) Same thing for domains in proteins? 3) What is the difference between conserved domains and domains? 4) How do I interpret a multiple sequence alignment. What all information can I get from it?

Thanks a lot, Kind regards, Srihari

alignment sequence • 1.3k views

ADD COMMENT • link updated 3.8 years ago by jared.andrews07 ★ 16k • written 4.0 years ago by sherlokhomes.sharma • 0

score 2 · Answer 1 · 2020-06-30

You seem to be lacking a fair amount of general information here, so I'm going to do my best to provide some information and resources to help you learn.

1. What are protein structural motifs?

I'm assuming you mean structural motifs rather than sequences for DNA binding motifs here. A very short little lesson on protein structure can be found here. From that course, a nice summary and examples are given:

Structural motifs are short segments of protein 3D structure, which are spatially close but not necessarily adjacent in the sequence. Structural motifs may be conserved in a large number of different proteins. Their role may be structural or functional.

An example of a structural motif that generally performs a structural role is a beta-turn. A beta turn consists of four consecutive residues where the polypeptide chain folds back on itself by nearly 180 degrees.

An example of a structural motif that has an important functional role is the helix-turn-helix motif which can bind DNA. This is a structural feature that is difficult to identify from the amino acid sequence alone.

There are several different structural protein motifs, the most common of which you can read simple descriptions of on Wikipedia. More in-depth information can be found easily enough by searching for a recent review.

2. What are protein domains?

This is actually a bit of a loaded question, as a protein "domain" can be defined several ways. But in essence:

Structural domains (the units of fold) are independently stable tertiary structures of proteins. They are distinct functional and/or structural units and can evolve, exist and function independently.

There are a lot of different protein domains with entire databases devoted to cataloging them.

3. What's the difference between a domain and a "conserved" domain?

Generally, "conserved" domains are identified by comparing the amino acid sequences of proteins with similar function across multiple organisms (AKA multiple alignment). This allows for the identification of recurrent protein sequence(s) that likely contribute, at least in part, to the overall protein function. These can be useful for identifying the potential function of a novel protein for which you may only know the protein sequence.

Conserved domains differ from 3-Dimensional structural domains as they represent and are identified by protein sequence instead of being represented and identified by geometric relations.

4. What does a multiple sequence alignment tell me?

I touch on this above, but multiple sequence alignments can be used in a few different realms. They're invaluable for determining evolutionary history between organisms (at both the DNA and protein levels), can be used to predict the function of a novel protein, can be used to classify similar proteins into "families", and more. The actual output of a multiple sequence alignment really just tells you how similar the amino acid sequences between a group of proteins are and helps to identify conserved regions (which are likely functional units, i.e. domains).

I'm not a protein expert, so somebody else will likely chime in with more nuanced explanations, but that should help you get on the right track.