Question

How to determine the c-terminal and n-terminal regions of a proteome of a non-model organism?

1

Entering edit mode

7.4 years ago

euduca ▴ 10

Hello everyone, I have come for some time looking for some way to determine (using bioinformatics) the c-terminal and n-terminal regions of a non-model organisms (in this case a plant). The proteome has more than 30,000 proteins.

I read some questions here in the group, but nothing that could help me in this matter. One methodology I created was using Interproscan to find functional signatures (Interpro signature or database annotation) that contained the term N-terminal or C-terminal.

So for any protein that has the term N-terminal I take the final position of the signature in the protein and consider that from the beginning to the final position found is the N-terminal region. (if there is more than one signature containing the term N-terminal, the highest position is considered at all). Thus, if a signature containing the term N-terminal ends at 600, the N-terminal region is considered from 1 to 600. For any protein having the term C-terminal I take the initial position of the protein signature and consider that from this position to the end of the protein is the C-terminal region. (if there is more than one signature containing the term C-terminal, the lowest position is considered the initial). Therefore, if a signature containing the term C-terminal, begins at 400, the C-terminal region begins at 400 and ends at the end sequence.

However, I am not very convinced by this approach, because in my analyzes (I am wondering how many specific amino acids are in each region) is very different from what I saw in scientific papers (dealing with the same subject I am researching) on proteomes, however researchers used organisms well-characterized models.

My object of study has more than 30000 proteins, however about 2000 proteins had a signature that contained the term N or C terminal.

I do not know if the way I determined the approximate size of the N and C regions in the proteins is correct, since I can have very large regions (secretion signatures, often only <40 aa in the N-terminal portion, for example) or very small.

A suggestion given by a colleague, would I determine a depth difference (determine a value, analyze the 100 residues at the N and C ends for proteins above 1000, 200 for proteins above 2000), or determine a fixed value for each region , for example 100 residues for each end.

Can anyone tell me if there is a way to use bioinformatics to help me solve this problem?

c-terminal n-terminal interproscan proteins • 8.2k views

ADD COMMENT • link updated 7.4 years ago by Michael 56k • written 7.4 years ago by euduca ▴ 10

score 0 · Answer 1 · 2018-01-31

0

Entering edit mode

7.4 years ago

Michael 56k

Maybe this helps: https://en.wikipedia.org/wiki/N-terminus ? Then it's not really a bioinformatics problem, is it?

ADD COMMENT • link 7.4 years ago by Michael 56k