Question: How to extract features from protein sequence for classification
gravatar for lacolombemarouane
6 weeks ago by
lacolombemarouane0 wrote:

Hello How could i get features like strand, domain,helix,turn,chain,mass,glycosylation,Active site,binding site,.... from protein sequence , because i wanna do a classification based on these features but i've no idea about what these features mean and i could not find a dataset, all i have is protein sequence file , is there any python library or some articles can help me. Cordially

sequencing sequence gene • 128 views
ADD COMMENTlink modified 6 weeks ago by skjobs012360 • written 6 weeks ago by lacolombemarouane0
gravatar for
6 weeks ago by
julien.fouret.fr20 wrote:

You can fetch the following URL :


For example, let's consider the IFIT2 human; the uniprot id is P09913 You'll need to fetch : Of note, the lines you are interested in starts with "FT"

Be aware that this coordinates are for the Canonical protein according to Uniprot system, please check is your protein have isoforms and act accordingly.

Or better you can use EBI API; but you'll still have the "canonical" issue.

Example extracted from :

                requestURL = ""+subject_id+"?types=INIT_MET%2CDISULFID%2CCROSSLNK%2CACT_SITE%2CMETAL%2CBINDING"
ADD COMMENTlink written 6 weeks ago by julien.fouret.fr20

If you only have protein sequences, you'll need to first perform a blast against swissprot database. Then, you take the best hit and fetch the uniprot id (id mapping tools from uniprot). To match coordinates of your protein with the one linked with the uniprot id the best is to perform an global pairwise alignment (see needle).

It would be smart to restrict your blast database to one species which would show the best annotation in terms of features...

ADD REPLYlink written 6 weeks ago by julien.fouret.fr20
gravatar for skjobs0123
6 weeks ago by
skjobs012360 wrote:

You can use uniport, pfam or other online resources for for such classification. If you have larger data set then download the pfam repository and do the blast with your query sequences and classify according to your needs.

ADD COMMENTlink written 6 weeks ago by skjobs012360
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1791 users visited in the last hour