Question: Go Enrichment Of Rna-Seq Data
3
gravatar for Assa Yeroslaviz
8.2 years ago by
Assa Yeroslaviz1.2k
Munich
Assa Yeroslaviz1.2k wrote:

Hi,

does anyone have experience with the goseq package in R?

I am trying to run a GO enrichment-test of drosophila RNA-seq data set, but unfortunately I encountered some problems with this package. First, the package contains fast no drosophila gene IDs Second, the package accepts only Entrez IDs.

I have the Flybase IDs (FBgnXXXXXXX) and would like to convert them as easy as possible to Entrez IDs.

Do you have any Ideas as to how to do it?

Are there any better option/R-packages to run a GO enrichment test?

Thanks

A.

gene enrichment rna identifiers • 7.8k views
ADD COMMENTlink modified 8.2 years ago by seidel6.8k • written 8.2 years ago by Assa Yeroslaviz1.2k
4
gravatar for Chris Evelo
8.2 years ago by
Chris Evelo10.0k
Maastricht, The Netherlands
Chris Evelo10.0k wrote:

Please be aware that while goseq is very useful for expression based analysis where you want to normalize for transcript length, it does not do the over representation analysis itself. Also see this question: Bioconductor Goseq - Overrepresented P-Values

The vignette says:

goseq will work with any method for determining differential expression and as such differential expression analysis is outside the scope of this document, but in order to facilitate ease of use, we will make use of the edgeR package to calculate differentially expressed (DE) genes in all the case studies in this document.

So it normally is edgeR that does the actual enrichment analysis.

For the mapping you could also use BridgeDB with the Drosophila Database available from the PathVisio download page, or with any of the supported mapping services. BridgeDB could be used as a local webservice that you could can call from R. Alternatively you can use BatchMapper, which is a BridgeDB based standalone tool. I am not sure whether that would solve your duplication problems.

Everything you need to instal BridgeDB should be here. If it is not please file a bug report or mail the developers list.

ADD COMMENTlink modified 4 weeks ago by RamRS24k • written 8.2 years ago by Chris Evelo10.0k

I do think that goseq do the enrichment analysis on its own. "This package provides methods for performing Gene Ontology analysis of RNA-seq data, taking length bias into account" [Quote] They don't do differential expression analysis, but if I understand it correctly, the package was created for enrichment calculations.

ADD REPLYlink written 8.2 years ago by Assa Yeroslaviz1.2k

I do think that goseq do the enrichment analysis on its own. "This package provides methods for performing Gene Ontology analysis of RNA-seq data, taking length bias into account" [Quote] They don't do differential expression analysis, but if I understand it correctly, the package was created for enrichment calculations.

To run the BridgeDB I need the lib files, but I can't find them on the web site of BridgeDB. Do you have a clue where they are or if I still need them.

ADD REPLYlink written 8.2 years ago by Assa Yeroslaviz1.2k

I tried to update my answer so it covers your comments.The part about the vignette was already in the other question.

ADD REPLYlink written 8.2 years ago by Chris Evelo10.0k
4
gravatar for seidel
8.2 years ago by
seidel6.8k
United States
seidel6.8k wrote:

You could try the GO term enrichment analysis with the GeneAnswers package, and use the bioconductor annotation packages for the ID mapping. The GeneAnswers documentation has improved since the package came out. And once you get the hang of the annotation packages, they seem fairly straightforward. I don't know how current they are, but I find them useful.

Here is a code snippet illustrating both:

library("GeneAnswers")
library("org.Dm.eg.db")
library("GO.db")

# get named vector of entrez ids
fb.entrez <- unlist(as.list(org.Dm.egFLYBASE2EG))

# for a data frame x with flybase ids (column 1) and data values (column 2)
# match the flybase names against the vector of entrez ids
iv <- match(x[,1], names(fb.entrez))

# add a column for entrez ids
x <- cbind(x,rep(NA, nrow(x)))

# fill it in by mapping the entez ids onto the matching flybase ids
x[,3] <- fb.entrez[iv]

# now you can do some GO analysis
# for an index vector "myTopHits" of your top data
topset <- x[myTopHits,3]
# remove entries that had no matching entrez id(NA)
topset <- topset[!is.na(topset)]

# Get BP enrichment
foo <- geneAnswersBuilder(topset, 'org.Dm.eg.db', categoryType='GO.BP', testType='hyperG')
go.bp <- foo@enrichmentInfo
ADD COMMENTlink modified 8.2 years ago by Michael Dondrup46k • written 8.2 years ago by seidel6.8k
2
gravatar for Neilfws
8.2 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

As usual, the answer for conversion between gene IDs is to use BioMart. Briefly:

  1. Choose database Ensembl Genes 63
  2. Choose dataset Drosophila melanogaster genes (BDGP5.25)
  3. Click "Filters" (left menu) and expand GENE
  4. Check ID list limit and choose Flybase Gene ID(s)
  5. Either upload your list of Flybase IDs, 1 per line, or paste in the box
  6. Click "Attributes" (left menu) and expand EXTERNAL
  7. Check EntrezGene ID under External References
  8. Optionally, expand GENE and un-check Ensembl Gene ID / Ensembl Transcript ID
  9. Click Results (top menu, left)

Then follow the menu prompts to download the results.

You can also do this in R using the biomaRt package; search this site for answers showing its usage.

ADD COMMENTlink modified 8.2 years ago • written 8.2 years ago by Neilfws48k

Thanks for the reply. Yes, I am familiar with biomart, but I have found duplications in the newest version of biomart (R package) of IDs which are already not in use in the entrez site (NCBI). I get a lot of duplications, which than need to be extracted. It would be nice to have another option to do such an analysis without the need to convert data this way and that way.

ADD REPLYlink written 8.2 years ago by Assa Yeroslaviz1.2k

I agree, it's annoying when tools require specific IDs. However, at least BioMart makes obtaining them relatively easy. I'm finding R biomaRt rather flaky at the moment, so I'm sticking with the website.

ADD REPLYlink written 8.2 years ago by Neilfws48k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2027 users visited in the last hour