Hello How could i get features like strand, domain,helix,turn,chain,mass,glycosylation,Active site,binding site,.... from protein sequence , because i wanna do a classification based on these features but i've no idea about what these features mean and i could not find a dataset, all i have is protein sequence file , is there any python library or some articles can help me. Cordially
You can fetch the following URL :
For example, let's consider the IFIT2 human; the uniprot id is P09913 You'll need to fetch : https://www.uniprot.org/uniprot/P09913.txt Of note, the lines you are interested in starts with "FT"
Be aware that this coordinates are for the Canonical protein according to Uniprot system, please check is your protein have isoforms and act accordingly.
Or better you can use EBI API; but you'll still have the "canonical" issue.
Example extracted from https://fouret.me/gitea/jfouret/gwAlign/src/branch/master/scripts/gwAlign-Unify :
requestURL = "https://www.ebi.ac.uk/proteins/api/features/"+subject_id+"?types=INIT_MET%2CDISULFID%2CCROSSLNK%2CACT_SITE%2CMETAL%2CBINDING"