Go Enrichment Of Rna-Seq Data
3
3
Entering edit mode
12.7 years ago
Assa Yeroslaviz ★ 1.8k

Hi,

does anyone have experience with the goseq package in R?

I am trying to run a GO enrichment-test of drosophila RNA-seq data set, but unfortunately I encountered some problems with this package. First, the package contains fast no drosophila gene IDs Second, the package accepts only Entrez IDs.

I have the Flybase IDs (FBgnXXXXXXX) and would like to convert them as easy as possible to Entrez IDs.

Do you have any Ideas as to how to do it?

Are there any better option/R-packages to run a GO enrichment test?

Thanks

A.

gene enrichment rna identifiers • 9.9k views
ADD COMMENT
4
Entering edit mode
12.7 years ago

Please be aware that while goseq is very useful for expression based analysis where you want to normalize for transcript length, it does not do the over representation analysis itself. Also see this question: Bioconductor Goseq - Overrepresented P-Values

The vignette says:

goseq will work with any method for determining differential expression and as such differential expression analysis is outside the scope of this document, but in order to facilitate ease of use, we will make use of the edgeR package to calculate differentially expressed (DE) genes in all the case studies in this document.

So it normally is edgeR that does the actual enrichment analysis.

For the mapping you could also use BridgeDB with the Drosophila Database available from the PathVisio download page, or with any of the supported mapping services. BridgeDB could be used as a local webservice that you could can call from R. Alternatively you can use BatchMapper, which is a BridgeDB based standalone tool. I am not sure whether that would solve your duplication problems.

Everything you need to instal BridgeDB should be here. If it is not please file a bug report or mail the developers list.

ADD COMMENT
0
Entering edit mode

I do think that goseq do the enrichment analysis on its own. "This package provides methods for performing Gene Ontology analysis of RNA-seq data, taking length bias into account" [Quote] They don't do differential expression analysis, but if I understand it correctly, the package was created for enrichment calculations.

ADD REPLY
0
Entering edit mode

I do think that goseq do the enrichment analysis on its own. "This package provides methods for performing Gene Ontology analysis of RNA-seq data, taking length bias into account" [Quote] They don't do differential expression analysis, but if I understand it correctly, the package was created for enrichment calculations.

To run the BridgeDB I need the lib files, but I can't find them on the web site of BridgeDB. Do you have a clue where they are or if I still need them.

ADD REPLY
0
Entering edit mode

I tried to update my answer so it covers your comments.The part about the vignette was already in the other question.

ADD REPLY
4
Entering edit mode
12.7 years ago
seidel 11k

You could try the GO term enrichment analysis with the GeneAnswers package, and use the bioconductor annotation packages for the ID mapping. The GeneAnswers documentation has improved since the package came out. And once you get the hang of the annotation packages, they seem fairly straightforward. I don't know how current they are, but I find them useful.

Here is a code snippet illustrating both:

library("GeneAnswers")
library("org.Dm.eg.db")
library("GO.db")

# get named vector of entrez ids
fb.entrez <- unlist(as.list(org.Dm.egFLYBASE2EG))

# for a data frame x with flybase ids (column 1) and data values (column 2)
# match the flybase names against the vector of entrez ids
iv <- match(x[,1], names(fb.entrez))

# add a column for entrez ids
x <- cbind(x,rep(NA, nrow(x)))

# fill it in by mapping the entez ids onto the matching flybase ids
x[,3] <- fb.entrez[iv]

# now you can do some GO analysis
# for an index vector "myTopHits" of your top data
topset <- x[myTopHits,3]
# remove entries that had no matching entrez id(NA)
topset <- topset[!is.na(topset)]

# Get BP enrichment
foo <- geneAnswersBuilder(topset, 'org.Dm.eg.db', categoryType='GO.BP', testType='hyperG')
go.bp <- foo@enrichmentInfo
ADD COMMENT
2
Entering edit mode
12.7 years ago
Neilfws 49k

As usual, the answer for conversion between gene IDs is to use BioMart. Briefly:

  1. Choose database Ensembl Genes 63
  2. Choose dataset Drosophila melanogaster genes (BDGP5.25)
  3. Click "Filters" (left menu) and expand GENE
  4. Check ID list limit and choose Flybase Gene ID(s)
  5. Either upload your list of Flybase IDs, 1 per line, or paste in the box
  6. Click "Attributes" (left menu) and expand EXTERNAL
  7. Check EntrezGene ID under External References
  8. Optionally, expand GENE and un-check Ensembl Gene ID / Ensembl Transcript ID
  9. Click Results (top menu, left)

Then follow the menu prompts to download the results.

You can also do this in R using the biomaRt package; search this site for answers showing its usage.

ADD COMMENT
0
Entering edit mode

Thanks for the reply. Yes, I am familiar with biomart, but I have found duplications in the newest version of biomart (R package) of IDs which are already not in use in the entrez site (NCBI). I get a lot of duplications, which than need to be extracted. It would be nice to have another option to do such an analysis without the need to convert data this way and that way.

ADD REPLY
0
Entering edit mode

I agree, it's annoying when tools require specific IDs. However, at least BioMart makes obtaining them relatively easy. I'm finding R biomaRt rather flaky at the moment, so I'm sticking with the website.

ADD REPLY

Login before adding your answer.

Traffic: 2323 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6