How to obtain more GO terms for transcripts
Entering edit mode
4.0 years ago
wangdp123 ▴ 250

Hi there,

I am working on the annotation of genes assembled from RNA-Seq data from a non-model species and I have tried to use interproscan to get the GO terms based on the protein sequences predicted by Transdecoder. It seems that only a limited number of genes have predicted ORF and only a proportion of ORFs have annotated GO terms. Finally, I found the annotation of GO terms is very poor.

In order to rescue the genes that been discarded by Transdecoder, I try to run interproscan directly through nucleotide sequences but it doesn't work since the built-in software getorf will produce a lot more proteins that interproscan can't handle in a limited period of time.

In fact, this is an exploratory study and I would like to have GO terms as many as possible.

Could anybody help me about this?

Many thanks,



Gene ontology de novo assembly • 1.2k views
Entering edit mode
4.0 years ago
Whoknows ▴ 870

Dear Tom,

The approach you have started is quite well for de-novo RNA-SEQ. As I have seen from these kinds of studies, GO term are generated much more than our real needs for a real biological facts.

I think it doesn't matter how many GO terms you find the importance is how many were significantly seen in your data specially for differentially expressed genes(DEG), the key section of RNA-SEQ studies is downstream analysis of DEGs, so don't think about number of GO terms instead you should rely on sig. founded ones.

I hope this post helps you : Pipeline suggestions for GO & pathway analysis for species without reference genome


Login before adding your answer.

Traffic: 1203 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6