Hi,
How can I make a Python function that, for each protein in a FASTA file, searches for the GO terms in UniProt?
What is the script that I need to use?
Thank you!!!
Hi,
How can I make a Python function that, for each protein in a FASTA file, searches for the GO terms in UniProt?
What is the script that I need to use?
Thank you!!!
You need biopython. Fisrt, you need to blast your sequences to uniprot using qblast() function in Bio.Blast.NCBIWWW. See instruction here: http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec95
Second, Filter the blast results. For example, one can be considered as a good ortholog only when the matched length is longer than 60% of the whole length of the query protein.
Third, retrieve Go information for the best blast hit. You still need biopython to do so.
http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec152
You will get a variable named "record
" as shown in the example.
Its attribute "cross_annotation"(or similar name, I don't remember clearly, you can use dir(record)
to see its exact name)
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
However, I highly recommend you to download the uniprot database and run the blast as well as the annotation locally if you need to run a large sets of proteins.