How to extract features from protein sequence for classification
2
0
Entering edit mode
4.4 years ago

Hello How could i get features like strand, domain,helix,turn,chain,mass,glycosylation,Active site,binding site,.... from protein sequence , because i wanna do a classification based on these features but i've no idea about what these features mean and i could not find a dataset, all i have is protein sequence file , is there any python library or some articles can help me. Cordially

sequence gene sequencing • 1.9k views
ADD COMMENT
0
Entering edit mode
4.4 years ago

You can fetch the following URL :

"https://www.uniprot.org/uniprot/"+canonicalUniprotID+".txt"

For example, let's consider the IFIT2 human; the uniprot id is P09913 You'll need to fetch : https://www.uniprot.org/uniprot/P09913.txt Of note, the lines you are interested in starts with "FT"

Be aware that this coordinates are for the Canonical protein according to Uniprot system, please check is your protein have isoforms and act accordingly.

Or better you can use EBI API; but you'll still have the "canonical" issue.

Example extracted from https://fouret.me/gitea/jfouret/gwAlign/src/branch/master/scripts/gwAlign-Unify :

                requestURL = "https://www.ebi.ac.uk/proteins/api/features/"+subject_id+"?types=INIT_MET%2CDISULFID%2CCROSSLNK%2CACT_SITE%2CMETAL%2CBINDING"
ADD COMMENT
0
Entering edit mode

If you only have protein sequences, you'll need to first perform a blast against swissprot database. Then, you take the best hit and fetch the uniprot id (id mapping tools from uniprot). To match coordinates of your protein with the one linked with the uniprot id the best is to perform an global pairwise alignment (see needle).

It would be smart to restrict your blast database to one species which would show the best annotation in terms of features...

ADD REPLY
0
Entering edit mode
4.4 years ago
skjobs ▴ 190

You can use uniport, pfam or other online resources for for such classification. If you have larger data set then download the pfam repository and do the blast with your query sequences and classify according to your needs.

ADD COMMENT

Login before adding your answer.

Traffic: 2987 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6