Question: How to determine the c-terminal and n-terminal regions of a proteome of a non-model organism?
gravatar for euduca
2.3 years ago by
Brasil, Ilhéus, UESC
euduca0 wrote:

Hello everyone, I have come for some time looking for some way to determine (using bioinformatics) the c-terminal and n-terminal regions of a non-model organisms (in this case a plant). The proteome has more than 30,000 proteins.

I read some questions here in the group, but nothing that could help me in this matter. One methodology I created was using Interproscan to find functional signatures (Interpro signature or database annotation) that contained the term N-terminal or C-terminal.

So for any protein that has the term N-terminal I take the final position of the signature in the protein and consider that from the beginning to the final position found is the N-terminal region. (if there is more than one signature containing the term N-terminal, the highest position is considered at all). Thus, if a signature containing the term N-terminal ends at 600, the N-terminal region is considered from 1 to 600. For any protein having the term C-terminal I take the initial position of the protein signature and consider that from this position to the end of the protein is the C-terminal region. (if there is more than one signature containing the term C-terminal, the lowest position is considered the initial). Therefore, if a signature containing the term C-terminal, begins at 400, the C-terminal region begins at 400 and ends at the end sequence.

However, I am not very convinced by this approach, because in my analyzes (I am wondering how many specific amino acids are in each region) is very different from what I saw in scientific papers (dealing with the same subject I am researching) on ​​proteomes, however researchers used organisms well-characterized models.

My object of study has more than 30000 proteins, however about 2000 proteins had a signature that contained the term N or C terminal.

I do not know if the way I determined the approximate size of the N and C regions in the proteins is correct, since I can have very large regions (secretion signatures, often only <40 aa in the N-terminal portion, for example) or very small.

A suggestion given by a colleague, would I determine a depth difference (determine a value, analyze the 100 residues at the N and C ends for proteins above 1000, 200 for proteins above 2000), or determine a fixed value for each region , for example 100 residues for each end.

Can anyone tell me if there is a way to use bioinformatics to help me solve this problem?

ADD COMMENTlink modified 2.3 years ago by Michael Dondrup47k • written 2.3 years ago by euduca0
gravatar for Michael Dondrup
2.3 years ago by
Bergen, Norway
Michael Dondrup47k wrote:

Maybe this helps: ? Then it's not really a bioinformatics problem, is it?

ADD COMMENTlink written 2.3 years ago by Michael Dondrup47k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1835 users visited in the last hour