Question: How to extract features from protein sequence for classification
0
gravatar for lacolombemarouane
9 months ago by
lacolombemarouane0 wrote:

Hello How could i get features like strand, domain,helix,turn,chain,mass,glycosylation,Active site,binding site,.... from protein sequence , because i wanna do a classification based on these features but i've no idea about what these features mean and i could not find a dataset, all i have is protein sequence file , is there any python library or some articles can help me. Cordially

sequencing sequence gene • 540 views
ADD COMMENTlink modified 9 months ago by skjobs012370 • written 9 months ago by lacolombemarouane0
0
gravatar for julien.fouret.fr
9 months ago by
julien.fouret.fr20 wrote:

You can fetch the following URL :

"https://www.uniprot.org/uniprot/"+canonicalUniprotID+".txt"

For example, let's consider the IFIT2 human; the uniprot id is P09913 You'll need to fetch : https://www.uniprot.org/uniprot/P09913.txt Of note, the lines you are interested in starts with "FT"

Be aware that this coordinates are for the Canonical protein according to Uniprot system, please check is your protein have isoforms and act accordingly.

Or better you can use EBI API; but you'll still have the "canonical" issue.

Example extracted from https://fouret.me/gitea/jfouret/gwAlign/src/branch/master/scripts/gwAlign-Unify :

                requestURL = "https://www.ebi.ac.uk/proteins/api/features/"+subject_id+"?types=INIT_MET%2CDISULFID%2CCROSSLNK%2CACT_SITE%2CMETAL%2CBINDING"
ADD COMMENTlink written 9 months ago by julien.fouret.fr20

If you only have protein sequences, you'll need to first perform a blast against swissprot database. Then, you take the best hit and fetch the uniprot id (id mapping tools from uniprot). To match coordinates of your protein with the one linked with the uniprot id the best is to perform an global pairwise alignment (see needle).

It would be smart to restrict your blast database to one species which would show the best annotation in terms of features...

ADD REPLYlink written 9 months ago by julien.fouret.fr20
0
gravatar for skjobs0123
9 months ago by
skjobs012370
skjobs012370 wrote:

You can use uniport, pfam or other online resources for for such classification. If you have larger data set then download the pfam repository and do the blast with your query sequences and classify according to your needs.

ADD COMMENTlink written 9 months ago by skjobs012370
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 581 users visited in the last hour