Question

Go Enrichment Of Rna-Seq Data

3

Entering edit mode

12.7 years ago

Assa Yeroslaviz ★ 1.8k

Hi,

does anyone have experience with the goseq package in R?

I am trying to run a GO enrichment-test of drosophila RNA-seq data set, but unfortunately I encountered some problems with this package. First, the package contains fast no drosophila gene IDs Second, the package accepts only Entrez IDs.

I have the Flybase IDs (FBgnXXXXXXX) and would like to convert them as easy as possible to Entrez IDs.

Do you have any Ideas as to how to do it?

Are there any better option/R-packages to run a GO enrichment test?

Thanks

A.

gene enrichment rna identifiers • 9.9k views

ADD COMMENT • link updated 12.7 years ago by seidel 11k • written 12.7 years ago by Assa Yeroslaviz ★ 1.8k

Ram · Answer 1 · 2011-08-02

4

Entering edit mode

12.7 years ago

Chris Evelo 10k

Please be aware that while goseq is very useful for expression based analysis where you want to normalize for transcript length, it does not do the over representation analysis itself. Also see this question: Bioconductor Goseq - Overrepresented P-Values

The vignette says:

goseq will work with any method for determining differential expression and as such differential expression analysis is outside the scope of this document, but in order to facilitate ease of use, we will make use of the edgeR package to calculate differentially expressed (DE) genes in all the case studies in this document.

So it normally is edgeR that does the actual enrichment analysis.

For the mapping you could also use BridgeDB with the Drosophila Database available from the PathVisio download page, or with any of the supported mapping services. BridgeDB could be used as a local webservice that you could can call from R. Alternatively you can use BatchMapper, which is a BridgeDB based standalone tool. I am not sure whether that would solve your duplication problems.

Everything you need to instal BridgeDB should be here. If it is not please file a bug report or mail the developers list.

ADD COMMENT • link updated 4.6 years ago by Ram 43k • written 12.7 years ago by Chris Evelo 10k

0

Entering edit mode

I do think that goseq do the enrichment analysis on its own. "This package provides methods for performing Gene Ontology analysis of RNA-seq data, taking length bias into account" [Quote] They don't do differential expression analysis, but if I understand it correctly, the package was created for enrichment calculations.

ADD REPLY • link 12.7 years ago by Assa Yeroslaviz ★ 1.8k

0

Entering edit mode

I do think that goseq do the enrichment analysis on its own. "This package provides methods for performing Gene Ontology analysis of RNA-seq data, taking length bias into account" [Quote] They don't do differential expression analysis, but if I understand it correctly, the package was created for enrichment calculations.

To run the BridgeDB I need the lib files, but I can't find them on the web site of BridgeDB. Do you have a clue where they are or if I still need them.

ADD REPLY • link 12.7 years ago by Assa Yeroslaviz ★ 1.8k

0

Entering edit mode

I tried to update my answer so it covers your comments.The part about the vignette was already in the other question.

ADD REPLY • link 12.7 years ago by Chris Evelo 10k

Michael · Answer 2 · 2011-08-02

You could try the GO term enrichment analysis with the GeneAnswers package, and use the bioconductor annotation packages for the ID mapping. The GeneAnswers documentation has improved since the package came out. And once you get the hang of the annotation packages, they seem fairly straightforward. I don't know how current they are, but I find them useful.

Here is a code snippet illustrating both:

library("GeneAnswers")
library("org.Dm.eg.db")
library("GO.db")

# get named vector of entrez ids
fb.entrez <- unlist(as.list(org.Dm.egFLYBASE2EG))

# for a data frame x with flybase ids (column 1) and data values (column 2)
# match the flybase names against the vector of entrez ids
iv <- match(x[,1], names(fb.entrez))

# add a column for entrez ids
x <- cbind(x,rep(NA, nrow(x)))

# fill it in by mapping the entez ids onto the matching flybase ids
x[,3] <- fb.entrez[iv]

# now you can do some GO analysis
# for an index vector "myTopHits" of your top data
topset <- x[myTopHits,3]
# remove entries that had no matching entrez id(NA)
topset <- topset[!is.na(topset)]

# Get BP enrichment
foo <- geneAnswersBuilder(topset, 'org.Dm.eg.db', categoryType='GO.BP', testType='hyperG')
go.bp <- foo@enrichmentInfo

score 2 · Answer 3 · 2011-08-02

2

Entering edit mode

12.7 years ago

Neilfws 49k

As usual, the answer for conversion between gene IDs is to use BioMart. Briefly:

Choose database Ensembl Genes 63
Choose dataset Drosophila melanogaster genes (BDGP5.25)
Click "Filters" (left menu) and expand GENE
Check ID list limit and choose Flybase Gene ID(s)
Either upload your list of Flybase IDs, 1 per line, or paste in the box
Click "Attributes" (left menu) and expand EXTERNAL
Check EntrezGene ID under External References
Optionally, expand GENE and un-check Ensembl Gene ID / Ensembl Transcript ID
Click Results (top menu, left)

Then follow the menu prompts to download the results.

You can also do this in R using the biomaRt package; search this site for answers showing its usage.

ADD COMMENT • link 12.7 years ago by Neilfws 49k

0

Entering edit mode

Thanks for the reply. Yes, I am familiar with biomart, but I have found duplications in the newest version of biomart (R package) of IDs which are already not in use in the entrez site (NCBI). I get a lot of duplications, which than need to be extracted. It would be nice to have another option to do such an analysis without the need to convert data this way and that way.

ADD REPLY • link 12.7 years ago by Assa Yeroslaviz ★ 1.8k

0

Entering edit mode

I agree, it's annoying when tools require specific IDs. However, at least BioMart makes obtaining them relatively easy. I'm finding R biomaRt rather flaky at the moment, so I'm sticking with the website.

ADD REPLY • link 12.7 years ago by Neilfws 49k