Question: edgeR Gene Ontology Analysis always returns the same list of pathways
1
gravatar for moxu
2.7 years ago by
moxu440
moxu440 wrote:

edgeR returns a list of differentially expressed genes OK, at least they do not always return genes in the same order.

However, when I run edgeR to get the top Gene Ontology pathways like the following:

rst<-topTags(qlfTest, n=nrow(d));
drst <- data.frame("GENE_ID"=rownames(rst), rst);
write.table(drst, file="mytreatment.DEG.txt", sep="\t", row.names=F, quote=F);

## the above "drst" returns a list of differentially expressed genes with pvalues, FDR, etc., and they look alright.

# get the differentially expressed genes list, throw away genes with FDR >= 0.25
degs <- as.vector(subset(drst, FDR < 0.25)[[1]])

# do GO analysis
go <- goana(degs, species="Hs");
tgo <- topGO(go);

It always return the following list of pathways:

GO_ID   Term    Ont N   DE  P.DE
GO:0000002  mitochondrial genome maintenance    BP  20  0   1
GO:0000003  reproduction    BP  925 0   1
GO:0000009  alpha-1,6-mannosyltransferase activity  MF  2   0   1
GO:0000010  trans-hexaprenyltranstransferase activity   MF  2   0   1
GO:0000012  single strand break repair  BP  8   0   1
GO:0000014  single-stranded DNA endodeoxyribonuclease activity  MF  6   0   1
GO:0000015  phosphopyruvate hydratase complex   CC  4   0   1
GO:0000016  lactase activity    MF  1   0   1
GO:0000018  regulation of DNA recombination BP  51  0   1
GO:0000019  regulation of mitotic recombination BP  4   0   1
GO:0000022  mitotic spindle elongation  BP  2   0   1
GO:0000023  maltose metabolic process   BP  1   0   1
GO:0000026  alpha-1,2-mannosyltransferase activity  MF  3   0   1
GO:0000027  ribosomal large subunit assembly    BP  2   0   1
GO:0000028  ribosomal small subunit assembly    BP  8   0   1
GO:0000030  mannosyltransferase activity    MF  17  0   1
GO:0000033  alpha-1,3-mannosyltransferase activity  MF  3   0   1
GO:0000035  acyl binding    MF  1   0   1
GO:0000036  ACP phosphopantetheine attachment site binding involved in fatty acid biosynthetic process  MF  1   0   1
GO:0000038  very long-chain fatty acid metabolic process    BP  25  0   1

No matter what experiment datasets (can be totally unrelated experiments). I guess something is not done right.

Thanks!

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by moxu440

Can you post head(degs) ?

ADD REPLYlink written 2.7 years ago by GZ1995350
> head(degs)
[1] "PPP1R10"  "C15orf52" "JPH2"     "PRMT6"    "SKA3"     "CREB1"

Guess they need to be converted to Entrez IDs from these (probably HGNC) gene symbols?

ADD REPLYlink written 2.7 years ago by moxu440

Yes. You need Entrez IDs as input to goana.

ADD REPLYlink written 2.7 years ago by GZ1995350

That might be the answer. However, I have not been able to do the conversion yet -- cannot find a tool to do so.

I tried

 library(org.Hs.eg.db)
 library(annotate)

in R, but it looks like it can only convert Entrez IDs to HGNC gene symbols but not the opposite.

Thanks!

ADD REPLYlink written 2.7 years ago by moxu440
library(Homo.sapiens)

de <- select(Homo.sapiens, keys = degs, keytype = "SYMBOL", columns="ENTREZID")
## remove duplicate
mat <- match(de$SYMBOL, degs)
de <- de[mat,]

By the way, I recommend reading this tutor as it explains how to perform pathway/gene set analysis with edgeR

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by GZ1995350

Thanks, it worked for human!

Unable to do the same for mouse though -- cannot find the "Homo.sapiens" counterpart after installing org.Mm.eg.db.

ADD REPLYlink written 2.7 years ago by moxu440
biocLite("Mus.musculus")
library(Mus.musculus)

Actually it is contained in org.Mm.egGO object.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by GZ1995350
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1643 users visited in the last hour