Question: Retrieve Go Terms Using Uniprot Blasts Results (Together W/ Gene_Association.Goa_Uniprot.Gz)
5.6 years ago by
shzhang10 wrote:

Hi all!

I got some differential expressed (DE) genes from a non-model RNA-seq project and I'd like to assign some GO ids to some of these DE genes.

I ran a blastx search of these DE genes against UniProtKB/Swiss-Prot using a cut-off E-value of 1e-5, and retained one best match (-max_target_seqs 1), the output of the blastx search was in xml format.

Then I downloaded the gene_association.goa_uniprot.gz.

I have two questions:

  • Is it necessary to run a blastx search of the DE genes which had no hits against UniProtKB/Swiss-Prot database against UniProtKB/TrEMBL database? (Since UniProt/Swissprot are curated, TrEMBL are automatic annotated)

  • I don't know how to use the blastx xml (or maybe tabular) result to retrieve GO ids from the goa_uniprot dataset. Is there any script for this purpose?


Kind regards,


ADD COMMENTlink modified 3.8 years ago by Biostar ♦♦ 20 • written 5.6 years ago by shzhang10
5.6 years ago by
Vienna, Austria
André Rendeiro50 wrote:


Answering your second question directly: yes, I do have a script to do such. I used it once to annotate the Ciona genome for a inter-species comparison.

It was not made by me, the author is Laurent Manchon. Here's a link to a gist: split_xml_blast_output.awk

You might want to have a look at Blast2GO, to automate the annotation process of Blast results with GO terms.

ADD COMMENTlink modified 5.6 years ago • written 5.6 years ago by André Rendeiro50

Hi Andre, thanks for your reply. The script is very useful to split blast xml results. Maybe I didn't express my question clearly. I'm wondering what's the relationship between this script and retrieve GO ids from gene_association.goa_uniprot.gz dataset?

I tried Blast2GO, it's so slow. So I'd like to do it locally.


ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by shzhang10
