I have a basic question about what test/reference sets can be used for GO enrichment analysis. All of the studies I come across ask whether certain gene subsets are enriched for a GO term. Is it appropriate to ask if a transcript subset is enriched? Or would that lead to some skewing of the statistics for/against genes with multiple isoforms?
I ask because I am working with a non-model organism (i.e. I need to do my own GO annotation) and would like to know if any of the genes/transcripts that are differentially expressed between two conditions are enriched for specific GO terms. I have a draft genome, a draft transcriptome (annotated using blast2go), and mRNA-Seq data. However, I find that there are several situations where a given gene with multiple isoforms has different GO-terms associated with each isoform.
My specific questions:
- Is it appropriate to do transcript-level GO enrichment analysis?
- Any references to studies that have done this successfully before?
- Alternatively, I could run a gene-level analysis if someone could suggest how to "collapse" different isoforms into a single sequence for use as input for blast2go :)