I am working on the annotation of genes assembled from RNA-Seq data from a non-model species and I have tried to use interproscan to get the GO terms based on the protein sequences predicted by Transdecoder. It seems that only a limited number of genes have predicted ORF and only a proportion of ORFs have annotated GO terms. Finally, I found the annotation of GO terms is very poor.
In order to rescue the genes that been discarded by Transdecoder, I try to run interproscan directly through nucleotide sequences but it doesn't work since the built-in software getorf will produce a lot more proteins that interproscan can't handle in a limited period of time.
In fact, this is an exploratory study and I would like to have GO terms as many as possible.
Could anybody help me about this?