How to calculate the mass of whole protiens and sub regions?
Entering edit mode
5.8 years ago

I am working on the automatic download of proteins to calculate their mass and the mass of the different subregions. I was wondering if there was a tool to help me with this or would I have to program it from scratch?

I can receive as an output a fasta file from NCBI or GenBank flat file (as well as other formats). The fasta contains no information about the regions. The relevant part of the genebank file looks like this:

**            ##Evidence-Data-END##
FEATURES             Location/Qualifiers
     source          1..230
                     /organism="Mus musculus"
     Protein         1..230
                     /product="endothelial cell-specific chemotaxis regulator"
                     /note="endothelial cell-specific molecule 2; apoptosis
                     regulator through modulating IAP expression"
     Region          134..228
                     /note="Endothelial cell-specific chemotaxis regulator;
     CDS             1..230
                     /gene_synonym="1110006O17Rik; ARIA"
        1 mlrdisleah glgstltpll ahqlpqgrvr gyssqptttq tsqeilqkss qvslvsnqpv
       61 tprsstmdkq slslpdlmsf qpqkhtlgpg tgtperssss ssssssrrge asldatpspe
      121 ttslqtkkmt illtilptpt sesvltvaaf gvisfivilv vvviilvsvv slrfkcrknk
      181 esedpqkpgs sglsescsta ngekdsitli smrninvnns kgsmsaekil


So in theory I can extract the region from this file using some text mining and parse the fasta. Since that would take sometime I figured I would post and see if anyone had a better solution

R protiens Amino Acids ExPASY • 1.4k views
Entering edit mode

If you don't have to work with these files, you could use EnsEMBL's API to extract this kind of information from the database. I think the protein molecular weight is available. You can also compute the mass of any peptide as the sum of the masses of the amino-acid residues (plus water). There are also plenty of online tools for this.

Entering edit mode

I do not have to work with these files, no. The key point is automation only. I just need to be able to feed a list of protein names and receives the MW of the whole protein and all its subregions. Doesn't matter what I use to achieve that, since it will just be used to compare the an MALDI output. I will take a look at that API, thanks! Also I indeed use an MW compute tool in R. The problem is getting the mass of the subregions!

Entering edit mode

For MW calculations in automated setting I can recommend the EMBOSS suite of tools.

Entering edit mode

You can extract the sequences of the regions and compute the masses yourself using a table of masses of amino-acid residues or using the mw() function of the R package Peptides.

Entering edit mode

That's what I have been doing for the whole protein sequence. Indeed it should not be too hard to extract the region based on the genebank file. I was just wondering if there was some automated way to extract or identify the regions of the protein. I guess I will do it myself. Thanks!


Login before adding your answer.

Traffic: 1335 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6