How to obtain more GO terms for transcripts
4.0 years ago
wangdp123

Hi there,

I am working on the annotation of genes assembled from RNA-Seq data from a non-model species and I have tried to use interproscan to get the GO terms based on the protein sequences predicted by Transdecoder. It seems that only a limited number of genes have predicted ORF and only a proportion of ORFs have annotated GO terms. Finally, I found the annotation of GO terms is very poor.

In order to rescue the genes that been discarded by Transdecoder, I try to run interproscan directly through nucleotide sequences but it doesn't work since the built-in software getorf will produce a lot more proteins that interproscan can't handle in a limited period of time.

In fact, this is an exploratory study and I would like to have GO terms as many as possible.

Could anybody help me about this?

Many thanks,



4.0 years ago
Whoknows

Dear Tom,

The approach you have started is quite well for de-novo RNA-SEQ. As I have seen from these kinds of studies, GO term are generated much more than our real needs for a real biological facts.

I think it doesn't matter how many GO terms you find the importance is how many were significantly seen in your data specially for differentially expressed genes(DEG), the key section of RNA-SEQ studies is downstream analysis of DEGs, so don't think about number of GO terms instead you should rely on sig. founded ones.

I hope this post helps you : Pipeline suggestions for GO & pathway analysis for species without reference genome


