I have some RNA-seq data for which I am testing for differential gene expression. I am currently in the process of annotating and doing over representation testing and pathway analysis for my genes, but I have encountered some problems.
My species is bovine, so my first tought was performing the annotation using the org.Bt.eg.db package from bioconductor and proceeding with the packages GOstats and Kegg.db for over representation and pathway analysis. When converting my Ensembl gene IDs to Entrez IDs (required by the GOstats) using the annotation package, I found that 398 of my genes were not assigned to any Entrez ID and 457 were assigned to more than one Entrez ID (sometimes 5 different Entrez IDs). When I checked for the genes with more than one Entrez IDs, I found that most of the Entrez IDs had nothing to do with the actual Ensembl gene. Also, I checked some of the genes that had no Entrez IDs by performing a blast against the NCBI database, and found that several actually had an Entrez ID associated to it.
I then tried converting my genes using Biomart, which gave me ~1200 genes with no corresponding Entrez IDs.
My question is, is there a way/package to perform overrepresentation testing and pathway analysis without using Entrez IDs as an input? Most packages require Entrez IDs and I am feeling lost on what to do.
I found that topGO can be used for the overrepresentation testing using Ensembl IDs, but not sure on how to proceed from there. If someone can help me out here I would appreciate it. Btw, in case it is important, I am using the release 84 of the Bos taurus genome from Ensembl.