Question: Processing GEO data and calculating differential expression using limma
0
gravatar for APJ
4 months ago by
APJ30
Sweden
APJ30 wrote:

Hello,

I am wondering about the data analysis strategy implemented to calculate the differential expression analysis between wildtype and knockout samples for 2 cell lines. It is a Illumina bead array chip data obtained from [GEO]. I have total of 4 samples for each cell line (2 controls, 2 treated). End output I would like to obtain is foldchange and pvalues; for comparisons done between controls and treated samples from 2 cell lines. Output: B_vs_A_log2FC B_vs_A_pvalue D_vs_C_log2FC D_vs_C_pvalue

I have my code (below), that could perform my task.

But I am wondering about 3 questions:

1) Is it correct to apply neqc function to normalize the expression values of .idat files

2) My design matrix without weights?

3) As both cell lines contains 2 replicates, is this limma calculation makes sense? and the reliability of differential expression values (pvalues and foldchanges)

Any valuable inputs or suggestions?

datadir <- 'GSExxx_RAWdata/' 
    idatfiles <- dir(path=datadir, pattern=".idat")
    bgxfile   <- dir(path=datadir, pattern=".bgx")
    x         <- read.idat(paste(datadir,idatfiles,sep=""), paste(datadir,bgxfile,sep = ""))
    x$other$Detection <- detectionPValues(x)
    **y <- neqc(x)**

## eliminating low detection probes
isexpr <- rowSums(y$other$Detection < 0.05) >= 3
y <- y[isexpr,]
data <- y$E
## add gene symbols
rownames(data) <- y$genes$Symbol
colnames(data) <- c('MD_NC_rep1','MD_NC_rep2',
                    'MD_siPK1_rep1','MD_siPK1_rep2',
                    'HT_NC_rep1','HT_NC_rep2',
                    'HT_siPK1_rep1','HT_siPK1_rep2')
head(data)

i=1; j=i+3; k=j+1; l=k+3
data_MD <- data[,i:j]
data_HT <- data[,k:l]
**design <- model.matrix(~0+factor(c(0,0,1,1)))**
colnames(design) <- c("Wild_type", "Mutant")

# > design
# Wild_type Mutant
# 1         1      0
# 2         1      0
# 3         0      1
# 4         0      1
# attr(,"assign")
# [1] 1 1
# attr(,"contrasts")
# attr(,"contrasts")$`factor(c(0, 0, 1, 1))`
# [1] "contr.treatment"


**cont_matrix <- makeContrasts(MutvsWT = Mutant-Wild_type, levels=design)**
# > cont_matrix
# Contrasts
# Levels      MutvsWT
# Wild_type      -1
# Mutant          1

op.matrix <- data
for (i in 1:2){
  if (i==1){data=data_MD; Nm="MD_siPK1_vs_MD_NC"; tg=targets[c(1:4),]}
  if (i==2){data=data_HT; Nm="HT_siPK1_vs_HT_NC"; tg=targets[c(5:8),]}  

  fit <- lmFit(data, design)  
  fit_contrast <- contrasts.fit(fit, cont_matrix)
  fit_contrast <- eBayes(fit_contrast)
  # Generate a list of top 100 differentially expressed genes
  output <- topTreat(fit_contrast, coef=1, number=Inf, adjust.method="BH")
  p1 <- output[,"P.Value"]
  FC <- output[,"logFC"]
  q1 <- p.adjust(p1, method="fdr");
  p1 <- signif(p1, digits=2);
  q1 <- signif(q1, digits=2);
  FC <- round(FC, digits=2) ;
  P.matrix <- cbind.data.frame(op.matrix, FC, p1, q1);
  colnames(op.matrix)[(ncol(op.matrix)-2):ncol(op.matrix)] <- c(
    paste(Nm, "-log2FC", sep=""),
    paste(Nm, "-p", sep=""),
    paste(Nm, "-FDR", sep=""));
}

op.matrix <- cbind.data.frame('Gene'=rownames(op.matrix),op.matrix)
ADD COMMENTlink modified 4 months ago • written 4 months ago by APJ30
1
gravatar for Kevin Blighe
4 months ago by
Kevin Blighe53k
Kevin Blighe53k wrote:

1) Is it correct to apply neqc function to normalize the expression values of .idat files

Yes, neqc() is fine for this data - it performs background correction via normexp, and then quantile normalises.

2) My design matrix without weights?

Only use weights in the formula if you have justification; otherwise, do not use weights for no good reason.

3) As both cell lines contains 2 replicates, is this limma calculation makes sense? and the reliability of differential expression values (pvalues and foldchanges)

As you know, 2 replicates is not great. Look at it this way: you would really struggle to re-publish this data by just re-processing this single dataset.

ADD COMMENTlink written 4 months ago by Kevin Blighe53k

@Kevin: Thank you for your reply.

By applying neqc normalization, is it the true differential expression is also normalized.

The boxplot , after neqc normalization looks like this https://www.dropbox.com/s/zl57ide3v1nilv8/GSE84962_data_boxplot_neqc.pdf?dl=0

May I know is it good to apply the neqc for 2 different cell lines seperately?

As you know, 2 replicates is not great. Look at it this way: you would really struggle to re-publish this data by just re-processing this single dataset.

Yes you are correct in terms of reproducibility. My top list of differential expression genes is quite different from the genes reported in the published article. So is the reason for this post, if I am doing something wrong with the DE calculations.

ADD REPLYlink modified 4 months ago • written 4 months ago by APJ30

Hey.

By applying neqc normalization, is it the true differential expression is also normalized.

The differential expression analysis will be performed on the normalised data. Basically, with the pipeline that you are using, the raw data that is returned by read.idat() will be an EListRaw object. When you run neqc(), it will become an EList object, in which the expression values will be normalised.

When you use limma on this data (the output of neqc()), it will automatically use the normalised expression values.

May I know is it good to apply the neqc for 2 different cell lines seperately?

No, you should normalise everything together, i.e., at the same time. This allows for cross-sample comparisons to take place by dealing with (adjusting for) cross-sample biases.

Yes you are correct in terms of reproducibility. My top list of differential expression genes is quite different from the genes reported in the published article. So is the reason for this post, if I am doing something wrong with the DE calculations.

I see. There are probably many reasons for this, including new updated analysis methods and different filter criteria.

ADD REPLYlink written 4 months ago by Kevin Blighe53k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1697 users visited in the last hour