Question: Microarray analysis in R (Limma): how to select genes exclusively expressed in one condition.
0
gravatar for reubenmcgregor88
4 months ago by
reubenmcgregor8840 wrote:

Background: I am mining some data using the Limma package to look at differentially expressed genes.

I have imported the expression data such that values of 0 are NA.

y <- read_excel("exprs.xlsx", na="0")

My phenotypic data looks like this:

> pData
         tissue donor
    AV3     AV     3
    AV4     AV     4
    AV8     AV     8
    MV3     MV     3
    MV4     MV     4
    MV8     MV     8
    PV3     PV     3
    PV4     PV     4
    PV8     PV     8
    TV3     TV     3
    TV4     TV     4
    TV8     TV     8

So 3 donors (3,4,8), 4 tissue types (AV,MV,PV,TV).

group <- factor(pData$tissue) 

design <- model.matrix(~0+group)

contrast.matrix <- makeContrasts(
  mv_vs_tv = MV-TV,levels=design)

fit <- lmFit(y, design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2, trend=T)

This will give me differential expressed genes between MV and TV.

I would now like to pull out the genes that are expressed exclusively in MV and not TV, AV or PV. By this I mean there is 0 or "NA" in all groups except MV, in any donor.

For example if donor 3 shows expression of "geneX" in MV but donor 4 and 8 do not, and all donors have NA expression in AV, PV or TV, I would like this to be reported. As well as if 2 or 3 donors show geneX expression in MV but not any other tissues.

Can anyone suggest some code that would do this, within or outside f Limma? I tried a few code chunks but am struggling to get it to work.

Many thanks.

limma microarray R • 312 views
ADD COMMENTlink modified 4 months ago by h.mon26k • written 4 months ago by reubenmcgregor8840
3
gravatar for Kevin Blighe
4 months ago by
Kevin Blighe46k
Kevin Blighe46k wrote:

Due to the way that microarray profiling works, which is based on fluorescent signal intensities, it is very difficult to say that something is not expressed at all. Probes may fall into the background noise but virtually all will still return some signal, whether it be related to the target mRNA or not. This is different from RNA-seq where a 0 actually makes intuitive sense.

You may consider a better approach by converting the log2-normalised expression values to the Z scale and then taking, as an example, >4 as expressed and <-4 as not expressed. This simple procedure is employed frequently. For one, cBioPortal use Z-scores for identifying genes expressed in tumours over normals, as an example.

ADD COMMENTlink modified 4 months ago • written 4 months ago by Kevin Blighe46k

I understand, I guess this is more of a coding question. I am not sure what is wrong with the data, but it seems to be working fine in this case. This is the only format I can get the data in for the moment. I will look into getting the data further upstream so I can QC it.

ADD REPLYlink written 4 months ago by reubenmcgregor8840
2
gravatar for Friederike
4 months ago by
Friederike4.6k
United States
Friederike4.6k wrote:

I absolutely second Kevin's advice, but if you need to figure out how to subset a data.frame based on NA values, these examples may give you some leads:

> tt <- data.frame(a = c(NA, 1,2,3), b = c(NA, NA, 1,2), c = c(NA, NA, NA, 1))
> tt
   a  b  c
1 NA NA NA
2  1 NA NA
3  2  1 NA
4  3  2  1
> subset(tt, is.na(b))
   a  b  c
1 NA NA NA
2  1 NA NA
> subset(tt, is.na(b) & is.na(a))
   a  b  c
1 NA NA NA
ADD COMMENTlink written 4 months ago by Friederike4.6k

Thank you Friederike, could you please expand on this.

Using your example I would like to select (for instance), row 2 based on a)it has NA in columns b and c, b) it has a non-NA in column a.

Using the "subset" above I could specific which rows to pull out based on NA but not which rows to pull out based on the fact stye have non-NA.

I have found a rather convoluted way of getting it but ay advice on how to simplify much appreciated:

exprs_MV_NA_2 <- 
  exprs%>%
  filteris.na(c(AV3))) %>%
  filteris.na(c(AV4))) %>%
  filteris.na(c(AV8))) %>%
  filteris.na(c(PV3))) %>%
  filteris.na(c(PV4))) %>%
  filteris.na(c(PV8))) %>%
  filteris.na(c(TV3))) %>%
  filteris.na(c(TV4))) %>%
  filteris.na(c(TV8)))

The above step gives me all rows where there is "NA" in all but the AV group

exprs_MV_NA_2 <- exprs_MV_NA_2[rowSumsis.na(exprs_MV_NA_2)) != ncol(exprs_MV_NA_2), ]

Then this step removes rows with all NA and seems to leave behind what I wanted...

ADD REPLYlink modified 4 months ago • written 4 months ago by reubenmcgregor8840
1

the expression for "not NA" is this: !is.na(x), i.e.: subset(tt, !is.na(a) & is.na(b) & is.na(c) )

ADD REPLYlink written 4 months ago by Friederike4.6k
2
gravatar for h.mon
4 months ago by
h.mon26k
Brazil
h.mon26k wrote:

Why are you considering zero as NAs in your intensity values? NAs have some special properties in R and most analyses drop them before performing calculations. See Gordon Smyth answer at the old Bioconductor thread Question: limma with missing values for a comment about how limma deals with NAs, and a recommendation of not introducing them.

I third the recommendation that it makes no sense to think about "not expressed" genes for microarrays. But I disagree with Kevin Blighe about RNAseq because:

1) even RNAseq has background noise, as most of the genome is transcribed at low levels (see the ENCODE Project paper An integrated encyclopedia of DNA elements in the human genome, also read Exactly 8.2% or 80% of the Human Genome is Functional to get acquainted with the controversy that arose from ENCODE).

2) thus, sequencing depth will play a large role in what you consider "non-expressed" in RNAseq. With shallow sequenced samples you will get more of "non-expressed" genes than with deeper sequenced samples.

ADD COMMENTlink written 4 months ago by h.mon26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1500 users visited in the last hour