Question: Different results with edgeR when adding gene symbols
0
gravatar for urmi208
2.0 years ago by
urmi20830
urmi20830 wrote:

Hello,

I am trying to get the gene names along with the Ensembl gene ids in the final edgeR output. The problem is: I am seeing different results when I add the gene symbols and when I don't. Here is example code without gene symbols

library(edgeR)
exp2 <-read.table('exp2',header=TRUE)
y <- readDGE(exp2,labels=exp2$sample,group=exp2$treatment)
keep <- rowSums(cpm(y)>1) >= 2
y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y)
bcv=0.4
et_D0_D1 <- exactTest(y, pair=c("D0","D1"),dispersion=bcv^2)
write.table(topTags(et_D0_D1,n=Inf),file="test1.txt",col.names=TRUE,quote=FALSE)

Result is: ENSGACG00000003380 1.00049178363978 4.49563701685595 0.234971858102264 1

Here is the one with gene symbols added using biomart:

library(edgeR)
library(biomaRt)
exp2 <-read.table('exp2',header=TRUE)
y <- readDGE(exp2,labels=exp2$sample,group=exp2$treatment)
geneid <-rownames(y)
ga82 <-  useEnsembl(biomart="ensembl",dataset="gaculeatus_gene_ensembl",version=82)
genes <- getBM(attributes=c('ensembl_gene_id','external_gene_name'),filters='ensembl_gene_id',values=geneid,mart=ga82)
y$genes <- genes
keep <- rowSums(cpm(y)>1) >= 2
y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y)
bcv=0.4
et_D0_D1 <- exactTest(y, pair=c("D0","D1"),dispersion=bcv^2)
write.table(topTags(et_D0_D1,n=Inf),file="test2.txt",col.names=TRUE,quote=FALSE)

Here the result is: ENSGACG00000003380 NA -0.135739707808875 2.95905945166089 0.874200189025877 1

SessionInfo() output is given below:

R version 3.2.1 (2015-06-18)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] biomaRt_2.24.1 edgeR_3.10.4   limma_3.24.15

loaded via a namespace (and not attached):
 [1] IRanges_2.2.9        DBI_0.3.1            parallel_3.2.1
 [4] RCurl_1.95-4.7       Biobase_2.28.0       AnnotationDbi_1.30.1
 [7] RSQLite_1.0.0        S4Vectors_0.6.6      BiocGenerics_0.14.0
[10] GenomeInfoDb_1.4.3   stats4_3.2.1         bitops_1.0-6
[13] XML_3.98-1.3

I am bit puzzled. Why would addition of gene names in y$genes above change the results. Any help you could provide will be very helpful. Many thanks in advance.

edger rna-seq biomart • 1.4k views
ADD COMMENTlink modified 2.0 years ago by MMa260 • written 2.0 years ago by urmi20830

how many rows do you have before and after adding the gene names?

ADD REPLYlink written 2.0 years ago by Giovanni M Dall'Olio26k

Thanks for your reply. I have 22460 rows both before and after adding the gene names

ADD REPLYlink written 2.0 years ago by urmi20830

please also post top 5-10 results for before and after cases.

ADD REPLYlink written 2.0 years ago by Santosh Anand4.7k

Before adding gene names:

ensembl_gene_id logFC logCPM PValue FDR
ENSGACG00000007157 3.24915717386252 1.07342942182991 0.000618728695167078 1
ENSGACG00000013912 2.3468671394398 0.373720501233804 0.0113007333301586 1
ENSGACG00000001321 2.21861945153189 0.972005925991169 0.0131377136373496 1
ENSGACG00000020320 -2.01587102395319 2.1704178289833 0.0207851586047353 1

After adding genenames:

ensembl_gene_id external_gene_name logFC logCPM PValue FDR
ENSGACG00000007401 si:dkey-34d22.1 3.24915717386252 1.07342942182991 0.000618728695167078 1
ENSGACG00000014287  2.3468671394398 0.373720501233804 0.0113007333301586 1
ENSGACG00000001351 ttf2 2.21861945153189 0.972005925991169 0.0131377136373496 1
ENSGACG00000020359 pafah1b2 -2.01587102395319 2.1704178289833 0.0207851586047353 1
ADD REPLYlink written 2.0 years ago by urmi20830
1

Your gene-annotation is somehow not correct. See that the numbers (FC, Pvalue etc) are same (before and after), but the ensembl_gene_id is different!

ADD REPLYlink written 2.0 years ago by Santosh Anand4.7k

Aha! There is merging issue. Any ideas how can I fix this?

ADD REPLYlink written 2.0 years ago by urmi20830

?merge()

PS: This just to keep away the min-char bot

ADD REPLYlink written 2.0 years ago by Santosh Anand4.7k
1

Thanks Santosh. Very helpful

ADD REPLYlink written 2.0 years ago by urmi20830

Please close this thread by accepting @MMa answer (green check mark to the left) if it solved your problem. Else, you can also write your own answer and accept it.

ADD REPLYlink written 2.0 years ago by Santosh Anand4.7k
3
gravatar for MMa
2.0 years ago by
MMa260
MMa260 wrote:

Remember, the output of getBM is not always in the order of values, thus the error.

What I'd do is:

geneid <-rownames(y)
ga82 <-  useEnsembl(biomart="ensembl",dataset="gaculeatus_gene_ensembl",version=82)
genes <- getBM(attributes=c('ensembl_gene_id','external_gene_name'),filters='ensembl_gene_id',values=geneid,mart=ga82)
rownames (genes) <- genes$ensembl_gene_id
y$genes <- genes [rownames(y), ]
ADD COMMENTlink written 2.0 years ago by MMa260

Thank you Mma. It makes sense. I will give this a go.

ADD REPLYlink written 2.0 years ago by urmi20830
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1588 users visited in the last hour