Question: edgeR - glm DE analysis for paired samples - errors
0
gravatar for Preethy Nair
5.1 years ago by
University of Helsinki, Finland
Preethy Nair0 wrote:

Hi All,

I had been trying to do DE analysis of my RNAseq experiment using edgeR and am having some isssues. The details of the Experiment and the R code I tried below: 

(a) Paired experimental design with 45 pairs
(b) Treatment: "Before" and "After"
(c) Phenotype: 1 & 2
Aim: Look for DE genes between Phenotype 1 and 2 upon treatment taking into account the paired design

The R code tried:

library(edgeR)
counts<-read.delim(file="counts.dat",header=T)
pair=factor(pdata$pair)
Treat=factor( pdata$treat)
Phenotype=factor(pdata$pheno) 
group<-paste(Treat,Phenotype,sep=".")
design=model.matrix(~pair+Treat:Phenotype, data=counts)
counts.DGEList<-DGEList(counts, group=group)
y<-calcNormFactors(counts.DGEList)
y<-estimateCommonDisp(y, design)
y<-estimateGLMTrendedDisp(y, design)


Error message I get:

Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset,  : 
  Design matrix not of full rank.  The following coefficients not estimable:
 TreatBefore:Phenotype1 TreatBefore:Phenotype2


Any idea to solve this out?

 

> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=fi_FI.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=fi_FI.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] edgeR_3.4.2   limma_3.18.13

 

Thanks a lot,
Preethy

ADD COMMENTlink modified 5.1 years ago by Charles Warden6.8k • written 5.1 years ago by Preethy Nair0

What's does design look like? The most common cause of this is having at least one unpaired sample. BTW, your usage of group is strange. What are you trying to achieve there?

ADD REPLYlink written 5.1 years ago by Devon Ryan90k

For those wondering, this is also cross-posted on the bioconductor email list.

ADD REPLYlink written 5.1 years ago by Devon Ryan90k

You could also just post pdata. Also, you likely want model.matrix(..., data=pdata)

ADD REPLYlink modified 5.1 years ago • written 5.1 years ago by Devon Ryan90k

Thanks for the replies. All the samples are paired and I have them all correct - I mean none of them are empty
I tried different designs. But ending up with the same error message.
What I want to do: Getting DE genes between (Phenotype1.Before-Phenotype1.After) & (Phenotype2.Before-Phenotype2.After)

pdata here:

pair=rep(c(1:45), each=2)
Treat=rep(c("Before", "After"),45)
Phenotype=rep(c(1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1,  2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 1,  2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 2), each=2)
pdata<-data.frame(pair,Treat,Phenotype)

ADD REPLYlink modified 5.1 years ago • written 5.1 years ago by Preethy Nair0

A contrast like the one mentioned in the edgeR manual section 3.2.3 & 3.2.4 for paired samples.

ADD REPLYlink modified 5.1 years ago • written 5.1 years ago by Preethy Nair0

I see James MacDonald replied while I was away from the computer, so his suggestion of effectively tweaking the design should work nicely. I should note that you'd get the same error message regardless of package (or even if you did everything manually and didn't bother with a package). Given a model matrix mm, you will always receive an error like this if min(dim(mm)) is greater than qr(mm)$rank, such as is the case above.

ADD REPLYlink written 5.1 years ago by Devon Ryan90k

Thanks for the reply. I noticed the problem with the design matrix. I tried it like Jim mentioned but came across with error again due to the reason Charles mentioned below. I removed empty columns from the design matrix and it worked.

Best,

 

ADD REPLYlink written 5.1 years ago by Preethy Nair0
1
gravatar for Charles Warden
5.1 years ago by
Charles Warden6.8k
Duarte, CA
Charles Warden6.8k wrote:

The problem is that all possible variables don't span one another:

For example, pairing ID "1" is present for TreatBefore:Phenotype1 and TreatAfter:Phenotype1, but not TreatBefore:Phenotype2 or TreatBefore:Phenotype2.

Also, I share Devon's concern about your design: my guess is that you don't want to study variation in a variable with 4 levels (TreatBefore:Phenotype1, TreatAfter:Phenotype1, TreatBefore:Phenotype2, TreatBefore:Phenotype2.), controlling for pairing.  I think your analysis needs multiple steps.  For example, if you take log2(RPKM + 0.1) values, you can subtract the after expression from the before expression and then look for differences in the normalized expression between phenotypes.   This is probably what I would do.  If you really want to stick with using edgeR, you could first define genes that vary between "Before" and "After" (correcting for pairing) and then define genes that change with phenotype (without controlling for pairing) and look for genes that overlap both lists.

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by Charles Warden6.8k

 Yes, you are right - "variables didn't span one another". Thanks a lot for the ideas. 

Best,

Preethy

ADD REPLYlink written 5.1 years ago by Preethy Nair0

what do you mean by 'then look for differences in the normalized expression between phenotypes'?

What method would you suggest?

 

ADD REPLYlink written 3.5 years ago by informatics bot570
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1616 users visited in the last hour