Different results with edgeR when adding gene symbols
1
0
Entering edit mode
7.0 years ago
urmi208 ▴ 30

Hello,

I am trying to get the gene names along with the Ensembl gene ids in the final edgeR output. The problem is: I am seeing different results when I add the gene symbols and when I don't. Here is example code without gene symbols

library(edgeR)
exp2 <-read.table('exp2',header=TRUE)
y <- readDGE(exp2,labels=exp2$sample,group=exp2$treatment)
keep <- rowSums(cpm(y)>1) >= 2
y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y)
bcv=0.4
et_D0_D1 <- exactTest(y, pair=c("D0","D1"),dispersion=bcv^2)
write.table(topTags(et_D0_D1,n=Inf),file="test1.txt",col.names=TRUE,quote=FALSE)

Result is: ENSGACG00000003380 1.00049178363978 4.49563701685595 0.234971858102264 1

Here is the one with gene symbols added using biomart:

library(edgeR)
library(biomaRt)
exp2 <-read.table('exp2',header=TRUE)
y <- readDGE(exp2,labels=exp2$sample,group=exp2$treatment)
geneid <-rownames(y)
ga82 <-  useEnsembl(biomart="ensembl",dataset="gaculeatus_gene_ensembl",version=82)
genes <- getBM(attributes=c('ensembl_gene_id','external_gene_name'),filters='ensembl_gene_id',values=geneid,mart=ga82)
y$genes <- genes
keep <- rowSums(cpm(y)>1) >= 2
y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y)
bcv=0.4
et_D0_D1 <- exactTest(y, pair=c("D0","D1"),dispersion=bcv^2)
write.table(topTags(et_D0_D1,n=Inf),file="test2.txt",col.names=TRUE,quote=FALSE)

Here the result is: ENSGACG00000003380 NA -0.135739707808875 2.95905945166089 0.874200189025877 1

SessionInfo() output is given below:

R version 3.2.1 (2015-06-18)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] biomaRt_2.24.1 edgeR_3.10.4   limma_3.24.15

loaded via a namespace (and not attached):
 [1] IRanges_2.2.9        DBI_0.3.1            parallel_3.2.1
 [4] RCurl_1.95-4.7       Biobase_2.28.0       AnnotationDbi_1.30.1
 [7] RSQLite_1.0.0        S4Vectors_0.6.6      BiocGenerics_0.14.0
[10] GenomeInfoDb_1.4.3   stats4_3.2.1         bitops_1.0-6
[13] XML_3.98-1.3

I am bit puzzled. Why would addition of gene names in y$genes above change the results. Any help you could provide will be very helpful. Many thanks in advance.

RNA-Seq edgeR biomart • 4.0k views
ADD COMMENT
0
Entering edit mode

how many rows do you have before and after adding the gene names?

ADD REPLY
0
Entering edit mode

Thanks for your reply. I have 22460 rows both before and after adding the gene names

ADD REPLY
0
Entering edit mode

please also post top 5-10 results for before and after cases.

ADD REPLY
0
Entering edit mode

Before adding gene names:

ensembl_gene_id logFC logCPM PValue FDR
ENSGACG00000007157 3.24915717386252 1.07342942182991 0.000618728695167078 1
ENSGACG00000013912 2.3468671394398 0.373720501233804 0.0113007333301586 1
ENSGACG00000001321 2.21861945153189 0.972005925991169 0.0131377136373496 1
ENSGACG00000020320 -2.01587102395319 2.1704178289833 0.0207851586047353 1

After adding genenames:

ensembl_gene_id external_gene_name logFC logCPM PValue FDR
ENSGACG00000007401 si:dkey-34d22.1 3.24915717386252 1.07342942182991 0.000618728695167078 1
ENSGACG00000014287  2.3468671394398 0.373720501233804 0.0113007333301586 1
ENSGACG00000001351 ttf2 2.21861945153189 0.972005925991169 0.0131377136373496 1
ENSGACG00000020359 pafah1b2 -2.01587102395319 2.1704178289833 0.0207851586047353 1
ADD REPLY
1
Entering edit mode

Your gene-annotation is somehow not correct. See that the numbers (FC, Pvalue etc) are same (before and after), but the ensembl_gene_id is different!

ADD REPLY
0
Entering edit mode

Aha! There is merging issue. Any ideas how can I fix this?

ADD REPLY
0
Entering edit mode

?merge()

PS: This just to keep away the min-char bot

ADD REPLY
1
Entering edit mode

Thanks Santosh. Very helpful

ADD REPLY
0
Entering edit mode

Please close this thread by accepting @MMa answer (green check mark to the left) if it solved your problem. Else, you can also write your own answer and accept it.

ADD REPLY
3
Entering edit mode
7.0 years ago
John Ma ▴ 310

Remember, the output of getBM is not always in the order of values, thus the error.

What I'd do is:

geneid <-rownames(y)
ga82 <-  useEnsembl(biomart="ensembl",dataset="gaculeatus_gene_ensembl",version=82)
genes <- getBM(attributes=c('ensembl_gene_id','external_gene_name'),filters='ensembl_gene_id',values=geneid,mart=ga82)
rownames (genes) <- genes$ensembl_gene_id
y$genes <- genes [rownames(y), ]
ADD COMMENT
0
Entering edit mode

Thank you Mma. It makes sense. I will give this a go.

ADD REPLY

Login before adding your answer.

Traffic: 2781 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6