Python script to calculate properties of amino acids
1
0
Entering edit mode
2.7 years ago
anasjamshed ▴ 120

I want to make a python script that can get as input sequence of N amino acids, and the output is a matrix of Nx6 containing 6 features about any of the acids.

I have developed this script :

#Importing Libraries
import numpy as np
import pandas as pd
from Bio.SeqUtils.ProtParam import ProteinAnalysis
from Bio.SeqUtils.ProtParam import ProtParamData
from quantiprot.metrics.aaindex import get_aa2volume, get_aa2hydropathy
from quantiprot.metrics.basic import average

#Input amino acid
 my_seq = input("Enter Your Sequence : ")
 SEQ = my_seq.upper()

#Analysis
analysed_seq = ProteinAnalysis(SEQ)
MW=  analysed_seq.molecular_weight()
Gravity = analysed_seq.gravy()
aa_composition = analysed_seq.count_amino_acids()
aa_percentage=analysed_seq.get_amino_acids_percent()
HP = analysed_seq.protein_scale(window=7, param_dict=ProtParamData.kd)
Sec_Str = analysed_seq.secondary_structure_fraction()

 #Amino Acid Compsoition Calculation
  print("\n" ,"AA count:",aa_composition,"\n")

  #Molecular Weight Calculation
   print("Molecular Weight : ",MW)


   #Gravity Calculation
    print("\n","Gravity:","\n\n",Gravity)
    #Hydrophobicty Calculation by using kd scale
    print("\n","Hydrophobicity:","\n\n",HP)
    #kd → Kyte & Doolittle Index of Hydrophobicity
   #Flex → Normalized average flexibility parameters (B-values)
   #hw → Hopp & Wood Index of Hydrophilicity
   #em → Emini Surface fractional probability

  #Amino Acid Percent Calculation

 print("\n","Percentage of Amino Acids in Protein:","\n\n",aa_composition)


 #Volume Calculation by using quantiprot library
 vol =  get_aa2volume(analysed_seq)
print("\n","Volume of amino acids in protein:","\n\n",str(vol))

The features are:

  1. Computed volume
  2. Hydrophobicity
  3. Polarity
  4. Relative surface accessibility (RSA)
  5. Secondary structure (SS)
  6. Type

I have calculated volume, Hydrophobicity, and SS but I don't know how can I merge them in one matrix?

Please help me

python • 6.7k views
ADD COMMENT
1
Entering edit mode

Could you provide an example showing what do those six features look like? If they are all lists, you can easily stack them by numpy.vstack to a 2D array

ADD REPLY
0
Entering edit mode

first, I have to calculate Polarity Relative surface accessibility (RSA) and Type but I am not sure which library I can use?

ADD REPLY
0
Entering edit mode

they all are not list

ADD REPLY
0
Entering edit mode

Can anyone help me please?

ADD REPLY
3
Entering edit mode
2.7 years ago
c_u ▴ 520

To me this sounds like a data-presentation question. You could do some simple string manipulation to output the fields for each protein into a single tab-separated string, and then have each row separated by newline. Such a tab-separated file can be easily viewed nicely in Excel etc.

Something like this -

outputs=[] # put this code before beginning to analyze the sequences

# run this code for each of your N amino acids
features=vol+"\t"+HP+"\t"+....[your other features]...+"\n"
outputs.append(features) 

#Put this code after you are done with all the sequences - 
with open('output', 'w') as f:
    for i in range(len(outputs)):
    f.write(outputs[i])
ADD COMMENT
0
Entering edit mode

outputs.append()=HP ^ SyntaxError: cannot assign to function call

ADD REPLY
0
Entering edit mode

it is giving error

ADD REPLY
1
Entering edit mode

Oh yes, it was about the use of append, I had made an error there. I have changed the code, hopefully it should work now

ADD REPLY
0
Entering edit mode

i have tried this again:

outputs=[] 
#Analysis
analysed_seq = ProteinAnalysis(SEQ)
MW=  analysed_seq.molecular_weight()
Gravity = analysed_seq.gravy()
aa_composition = analysed_seq.count_amino_acids()
aa_percentage=analysed_seq.get_amino_acids_percent()
HP = analysed_seq.protein_scale(window=7, param_dict=ProtParamData.kd)
Sec_Str = analysed_seq.secondary_structure_fraction()
features=Sec_Str+"\t"+HP+"\t"+MW+"\n"
outputs.append(features) 

but its giving me this error :

---> 22 features=Sec_Str+"\t"+HP+"\t"+MW+"\n"
     23 outputs.append(features)
     24 

TypeError: can only concatenate tuple (not "str") to tuple
ADD REPLY
1
Entering edit mode

Well that means some the 3 items that you are concatenating are tuples, and some other items are strings. It might be best to convert everything to string. You should first figure out the type of each of the 3 items, and then convert the ones that are tuples into string. This is one way to do it

ADD REPLY
2
Entering edit mode

Also, don't forget to upvote comments and answers that were helpful in any way. It is a way to show appreciation for the time that another person spent trying to help you :)

ADD REPLY
0
Entering edit mode

i am trying this now:

#Importing Libraries
import numpy as np
import pandas as pd
from Bio.SeqUtils.ProtParam import ProteinAnalysis
from Bio.SeqUtils.ProtParam import ProtParamData
from quantiprot.metrics.aaindex import get_aa2volume, get_aa2hydropathy
from quantiprot.metrics.basic import average

#Input amino acid
my_seq = input("Enter Your Sequence : ")
SEQ = my_seq.upper()

outputs=[] 
#Analysis
analysed_seq = ProteinAnalysis(SEQ)
MW=  analysed_seq.molecular_weight()
Gravity = analysed_seq.gravy()
aa_composition = analysed_seq.count_amino_acids()
aa_percentage=analysed_seq.get_amino_acids_percent()
HP = analysed_seq.protein_scale(window=7, param_dict=ProtParamData.kd)
Sec_Str = analysed_seq.secondary_structure_fraction() #Volume Calculation by using quantiprot library
vol =  get_aa2volume(analysed_seq)
features="Composition:"+str(aa_composition)+" Hydrophobicity:"+str(HP)+" Volume:"+str(vol.mapping)+" Secondary Structure:"+str(Sec_Str)+" Molecular Weight:"+str(MW)
outputs.append(features) 

properties = np.array(outputs)
print(properties)

and it gives me this result:

Enter Your Sequence: ASDFMLL
["Composition:{'A': 1, 'C': 0, 'D': 1, 'E': 0, 'F': 1, 'G': 0, 'H': 0, 'I': 0, 'K': 0, 'L': 2, 'M': 1, 'N': 0, 'P': 0, 'Q': 0, 'R': 0, 'S': 1, 'T': 0, 'V': 0, 'W': 0, 'Y': 0} Hydrophobicity:[1.4000000000000001] Volume:{'A': 1.0, 'C': 2.43, 'D': 2.78, 'E': 3.78, 'F': 5.89, 'G': 0.0, 'H': 4.66, 'I': 4.0, 'K': 4.77, 'L': 4.0, 'M': 4.43, 'N': 2.95, 'P': 2.72, 'Q': 3.95, 'R': 6.13, 'S': 1.6, 'T': 2.6, 'V': 3.0, 'W': 8.08, 'Y': 6.47} Secondary Structure:(0.42857142857142855, 0.14285714285714285, 0.5714285714285714) Molecular Weight:795.9429"]
ADD REPLY
1
Entering edit mode

but now this script:

with open('output', 'w') as f:
    for i in len(outputs):
        f.write(outputs[i])

gives me an error:

---> 30     for i in len(outputs):
     31         f.write(outputs[i])

TypeError: 'int' object is not iterable
ADD REPLY
2
Entering edit mode

One way to resolve that error is to replace for i in len(outputs): with for i in range(len(outputs)): so that it now becomes an iterable

ADD REPLY
2
Entering edit mode

Please accept the answer if it was helpful. That encourages us to help you in the future as well.

ADD REPLY
0
Entering edit mode

I want to calculate Polarity, Relative surface accessibility (RSA), and Type but I am not sure which library I can use?

ADD REPLY
0
Entering edit mode

Well, since that is a separate issue, I would suggest that you should ask a new question for it, after accepting this answer.

ADD REPLY
1
Entering edit mode

Going through your profile, it looks like you have not accepted any answer to any of your posts. Accepting answers is crucial for this community. Do you know how to do that? Here is an explanation by one of the BioStars moderators - How to Use Biostars, Part-I: Questions, Answers, Comments and Replies

ADD REPLY

Login before adding your answer.

Traffic: 2559 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6