Question: How Can The Same Gene Be Both Significantly Up- And Down-Regulated According To The Gxa?
3
gravatar for Neilfws
6.3 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

I've been playing around with the EBI Gene Expression Atlas (GXA). It has an API. So, for example, I can retrieve data about the human gene SRI in JSON format using this URI:

http://www.ebi.ac.uk:80/gxa/api/vx?geneIs=ENSG00000075142&format=json

I wrote some R code to fetch/parse the JSON into a data frame:

library(RCurl)
library(rjson)
library(plyr)

j2df <- function(l) {
  e <- lapply(l$results[[1]]$expressions, function(x) {
    ef  <- x$ef
    efv <- x$efv
    updn <- sapply(x$experiments, function(y) {
      y$updn
    })
    pval <- sapply(x$experiments, function(y) {
      y$pvalue
    })
    accn <- sapply(x$experiments, function(y) {
      y$experimentAccession
    })
    list(ef = ef, efv = efv, accn = accn, updn = updn, pvalue = pval)
  }
              )
  e <- ldply(e, as.data.frame)
  return(e)
}

# fetch the JSON
j <- fromJSON(getURL("http://www.ebi.ac.uk:80/gxa/api/vx?geneIs=ENSG00000075142&format=json"))
# convert to data frame
sri <- j2df(j)

When I examine the first few rows, I see:

head(sri)
         ef   efv      accn updn pvalue
1 cell_line   1A2 E-MTAB-37 DOWN  0.000
2 cell_line 22Rv1 E-MTAB-37   UP  0.003
3 cell_line 22Rv1 E-MTAB-37 DOWN  0.019
4 cell_line  5637 E-MTAB-37   UP  0.000
5 cell_line  647V E-MTAB-37   UP  0.009
6 cell_line  769P E-MTAB-37   UP  0.000

According to rows 2 and 3, the same gene (SRI) in the same experiment (E-MTAB-37) is both up-regulated (p = 0.003) and down-regulated (p = 0.019) in cell line 22Rv1, as compared with mean expression from all cell lines. At least, that is my understanding of UP and DOWN as defined in the GXA documentation.

Am I missing something obvious? Or are the data returned by the GXA API simply nonsense?

database microarray • 1.5k views
ADD COMMENTlink modified 6.3 years ago • written 6.3 years ago by Neilfws48k
6
gravatar for Neilfws
6.3 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

Let me answer my own question.

We can view the gene and experiment at this link. If we then select cell line 22Rv1 under conditions and refresh, we see that there are 2 probes (or "design elements") for the gene on this array. The measurement for 208920_at is UP and that for 208921_s_at is DOWN.

ADD COMMENTlink modified 6.3 years ago • written 6.3 years ago by Neilfws48k
2

Using a custom cdf like the ones from brainarray (http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/genomiccuratedCDF.asp), where all probes targeting the same gene would be combines should prevent this problem. Of course the different probes could also target different transcripts for the same gene which would give a biological explanation for what you found.

ADD REPLYlink written 6.3 years ago by Chris Evelo9.9k

randomly clicking around and selecting various cell lines one can find other similar examples: D341Med, Detroit562, H4, HPAFII where the designations don't match. Yet have really high p-values, (D341Med has p-values of E-7 and E-10 indicating opposing behaviors) in many other cases one of the p-values is ridiculously low 1E-10 whereas the other is non-defined.

in a way demonstrates the utility (or lack thereof) of p-values

ADD REPLYlink written 6.3 years ago by Istvan Albert ♦♦ 78k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1340 users visited in the last hour