Question: go term analysis with ensembl gene id
0
gravatar for yuxinghai
9 months ago by
yuxinghai0
yuxinghai0 wrote:

I get some ensembl gene id after gene different expression analysis with DEseq2. I want to perform GO enrichment analysis, but almost half of them can't be recognized by DAVID. some people said I could use biomart in ensembl to get corresponding GO term of each gene, but what should I next do?

go rna-seq ensembl gene • 829 views
ADD COMMENTlink modified 9 months ago by Jean-Karim Heriche13k • written 9 months ago by yuxinghai0
1

Give GeneSCF a try. It supports Ensembl ID's.

ADD REPLYlink modified 9 months ago • written 9 months ago by genomax37k

Sorry to say this. GeneSCF does not support Ensembl IDs directly. But you can convert into Gene Symbols and Entrez ids and use it in GeneSCF.

ADD REPLYlink modified 9 months ago • written 9 months ago by EagleEye4.8k

It's a pity that it doesn't work with EnsEMBL. In my work I find EnsEMBL a much better resource than NCBI.

ADD REPLYlink written 9 months ago by Jean-Karim Heriche13k

It was problem when I try to implement Ensembl with GeneSCF. Because for some of the GeneSymbols the Ensembl ID (ENSG) is varying depending on the version of Ensembl.

Example, for KCNQ1OT1, I can see different ENSG-ID in old Ensembl (ENSG00000258492.1, GRCh37.66, gencode v11) and new Ensembl (ENSG00000269821.1, GRCh37.74-75, gencode v19). Only thing constant here was Gene Symbol or Entrez ID for this gene.

Atleast if I have something constant (fixed) like Gene Symbols (I can easily deal with multiple alias) or Entrez IDs, I can use it confidently (Otherwise, this might mislead).

ADD REPLYlink modified 9 months ago • written 9 months ago by EagleEye4.8k

Don't use the .x version number of EnsEMBL IDs, they should be more stable this way. Gene symbols are also not stable (although I must say they change less often than they used to a few years ago). Also the whole problem is to define what a gene is and work with this definition in a consistent way. It seems that for you a gene is defined by whatever share the same symbol. This is reasonable as this is more or less the definition used by biologists but as you've already experienced, it can create computational problems. It is also not always the best definition to use, especially when the underlying genome matters. The problem with Entrez is that it is unclear what a gene is. From this paper:

A GeneID is usually assigned to what is annotated as a gene on a RefSeq record. ... A GeneID may also be assigned when no RefSeq exists.

And from the RefSeq book section on curation:

A sequence record unambiguously associated with a Gene record may be propagated into a RefSeq record.

This looks very circular and ad hoc to me.

A RefSeq record is suppressed if it is found to represent a transcribed repeat element, ... or not to represent a "gene".

Notice the quote around the word gene, which I take to indicate there's no formal definition of the term.

Anyway, the conclusion is that there are different definitions of what a gene is and that one should pick a reference and stick to it for the duration of a project or risk inconsistent results.

ADD REPLYlink modified 9 months ago • written 9 months ago by Jean-Karim Heriche13k
2
gravatar for Jean-Karim Heriche
9 months ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche13k wrote:

You could use an R package like topGO or one of the Babelomics enrichment tools.

ADD COMMENTlink written 9 months ago by Jean-Karim Heriche13k
1
gravatar for EagleEye
9 months ago by
EagleEye4.8k
Sweden
EagleEye4.8k wrote:

Suggestion:

1) Using BioMart convert your Ensembl (ENSG) Ids into Gene Symbols or Entrez GeneIDs (check steps here).

2) Use GeneSCF to do enrichment analysis.

ADD COMMENTlink modified 9 months ago • written 9 months ago by EagleEye4.8k

but many ensemble gene id don't have corresponding Entrez ids.

ADD REPLYlink written 9 months ago by yuxinghai0

All Ensembl IDs will have corresponding GeneSymbols. You can use that information.

ADD REPLYlink written 9 months ago by EagleEye4.8k
1
gravatar for b.nota
9 months ago by
b.nota3.6k
Netherlands
b.nota3.6k wrote:

With goseq in R you can use ensemble IDs.

Or clusterProfiler, which has a good tutorial:

http://bioconductor.org/packages/release/bioc/vignettes/clusterProfiler/inst/doc/clusterProfiler.html

ADD COMMENTlink written 9 months ago by b.nota3.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 607 users visited in the last hour