Retrieve Go Terms Using Uniprot Blasts Results (Together W/ Gene_Association.Goa_Uniprot.Gz)
1
1
Entering edit mode
7.7 years ago
shzhang ▴ 20

Hi all!

I got some differential expressed (DE) genes from a non-model RNA-seq project and I'd like to assign some GO ids to some of these DE genes.

I ran a blastx search of these DE genes against UniProtKB/Swiss-Prot using a cut-off E-value of 1e-5, and retained one best match (-max_target_seqs 1), the output of the blastx search was in xml format.

Then I downloaded the gene_association.goa_uniprot.gz.

I have two questions:

• Is it necessary to run a blastx search of the DE genes which had no hits against UniProtKB/Swiss-Prot database against UniProtKB/TrEMBL database? (Since UniProt/Swissprot are curated, TrEMBL are automatic annotated)

• I don't know how to use the blastx xml (or maybe tabular) result to retrieve GO ids from the goa_uniprot dataset. Is there any script for this purpose?

Thanks.

Kind regards,

Senhao

go uniprot • 3.7k views
0
Entering edit mode
7.7 years ago

Hey,

Answering your second question directly: yes, I do have a script to do such. I used it once to annotate the Ciona genome for a inter-species comparison.

It was not made by me, the author is Laurent Manchon. Here's a link to a gist: split_xml_blast_output.awk

You might want to have a look at Blast2GO, to automate the annotation process of Blast results with GO terms.

0
Entering edit mode

Hi Andre, thanks for your reply. The script is very useful to split blast xml results. Maybe I didn't express my question clearly. I'm wondering what's the relationship between this script and retrieve GO ids from gene_association.goa_uniprot.gz dataset?

I tried Blast2GO, it's so slow. So I'd like to do it locally.

Thanks.