Question: Problem is to design model matrix for edgeR
1
gravatar for bioinforupesh2009.au
4.4 years ago by
Spain
bioinforupesh2009.au100 wrote:

Dear all,

I am trying edgeR to detect DE gene for my time course data. I have one experimental condition  with 4 time point, But while using design matrix its give me a error.
My code is: (in short:)

> label

 [1] WT WT WT WT WT WT WT WT WT WT WT WT WT WT WT WT WT
Levels: WT
> Time
 [1] T1 T1 T1 T1 T1 T1 T2 T2 T2 T2 T3 T3 T3 T3 T4 T4 T4
Levels: T1 T2 T3 T4

# Before we fit GMLs, we need to define our design matrix
design <- model.matrix (~Time + label)

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
  contrasts can be applied only to factors with 2 or more levels

any advice ??? Thanks in advance guys.

ADD COMMENTlink modified 4.4 years ago by RamRS20k • written 4.4 years ago by bioinforupesh2009.au100

As Devon say below label serves no purpose, when trying to generate the design.matrix you need more than one level, other wise R cannot create the model representation. 

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by smilefreak400
4
gravatar for Devon Ryan
4.4 years ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:

Since label serves no purpose, try design <- model.matrix(~Time) and let us know if that doesn't fix things.

ADD COMMENTlink written 4.4 years ago by Devon Ryan88k

Thank you very much Ryan....you are fabulous. its work :)))
 

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by bioinforupesh2009.au100

Dear Ryan:
I am wondering where i have Time 1 as i have 4 time in my series ??
> design
        (Intercept) TimeT2 TimeT3 TimeT4
WT.T1.1           1      0      0      0
WT.T1.2           1      0      0      0
WT.T1.3           1      0      0      0
WT.T1.4           1      0      0      0
WT.T1.5           1      0      0      0
WT.T1.6           1      0      0      0
WT.T2.1           1      1      0      0
WT.T2.2           1      1      0      0
WT.T2.3           1      1      0      0
WT.T2.4           1      1      0      0
WT.T3.1           1      0      1      0
WT.T3.2           1      0      1      0
WT.T3.3           1      0      1      0
WT.T3.4           1      0      1      0
WT.T4.1           1      0      0      1
WT.T4.2           1      0      0      1
WT.T4.3           1      0      0      1
attr(,"assign")
[1] 0 1 1 1
attr(,"contrasts")
attr(,"contrasts")$Time
[1] "contr.treatment"

ADD REPLYlink written 4.4 years ago by bioinforupesh2009.au100
1

You specified to keep the intercept, so it's not an explicit coefficient (everything is being compared to it). If you want to use contrasts instead, then use model.matrix(~0+Time).

ADD REPLYlink written 4.4 years ago by Devon Ryan88k

Dear Devon,
sorry for late comments for the same question: Advised model.matrix(~0+Time) may i use coef =2:4 ?;
lrt <- glmLRT(fit,coef=2:4) ?? is it ok ??? if yes, then i got about 700 genes (miRNAs) with hugest logFC out of ~1200. is it right ??? i have doubt ?? something is wrong !!!

#Estimate Dispersion
design <- model.matrix (~0 +Time)
rownames(design) <- colnames(y)
design
y <- estimateGLMCommonDisp(y, design, verbose =T)
y <- estimateGLMTrendedDisp(y,design)
y <- estimateGLMTagwiseDisp(y,design)
fit <- glmFit(y,design)
lrt <- glmLRT(fit,coef=2:4)
topTags(lrt)


Coefficient:  TimeT2 TimeT3 TimeT4
                 logFC.TimeT2 logFC.TimeT3 logFC.TimeT4   logCPM       LR          PValue      FDR
mmu-miR-5619-5p     -20.45565    -20.22821    -19.54368 1.159783 43579141     0              0
mmu-miR-28c         -20.45565    -19.69322    -19.92847 1.161204 43576050        0              0
mmu-miR-6395        -20.45565    -19.93606    -20.08351 1.163118 43334784       0              0
.....

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by bioinforupesh2009.au100

If you're going to use model.matrix(~0+Time) then you really need to use contrasts. Otherwise, you end up testing whether you have signal from an miRNA at T2, T3 and/or T4...which is probably not what you wanted to test. If you want to use glmLRT(fit,coef=2:4), then use model.matrix(~Time) instead. This would then ask the question, "Is there an effect of time on a given miRNA." This is likely what you actually want to know.

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Devon Ryan88k

Thank Devon for you generous advice. I tried the same and got 0 significant miRNA of FDR < 0.05 while having about 51 significant raw pvalue, could you please tell me how it happened.
# noemalization
y <- DGEList(counts=wt,group=label)
#Estimate Dispersion
design <- model.matrix (~Time)
rownames(design) <- colnames(y)
design
y <- estimateGLMCommonDisp(y, design, verbose =T)
y <- estimateGLMTrendedDisp(y,design)
y <- estimateGLMTagwiseDisp(y,design)
fit <- glmFit(y,design)
lrt <- glmLRT(fit,coef=2:4)
topTags(lrt)
> sum(lrt$table$PValue < 0.05)
[1] 51
> sum(lrt$table$FDR < 0.05)
[1] 0

ADD REPLYlink written 4.4 years ago by bioinforupesh2009.au100

If there were only 51 genes with raw p-values <0.05, then it's entirely expected that you'd have no significant findings. It seems that time has no effect in your dataset.

ADD REPLYlink written 4.4 years ago by Devon Ryan88k

Do you think, gene filtering step can affect to find the significant genes ??? May i try DESeq2 packages as he has gene filtering steps ??? Do you think its worth to try ??? what edgeR does not have it.

ADD REPLYlink written 4.4 years ago by bioinforupesh2009.au100

Sure, give it a try. In your case I doubt it'll make a difference, but it won't hurt to try.
 

ADD REPLYlink written 4.4 years ago by Devon Ryan88k

Dear Dev,
could you help me once again ??
I tried but got error while using DESeqDataSeq
library(DESeq2)
x= read.table("counts_filt.txt", header=T)
rownames(x) <- x$miRNA
x <- x [, -1]
design=read.table("design_deseq.txt", header=T)
>design
strain time.m replicates        id
WT.T1.1     wt      1         r1        wk
WT.T1.2     wt      1         r1        wk
WT.T1.3     wt      1         r1        wk
WT.T2.1     wt     12         r2      year
WT.T2.2     wt     12         r2      year
WT.T2.3     wt     12         r2      year
WT.T2.4     wt     12         r2      year
WT.T3.1     wt     21         r3   two_years
.......
.................
WT.T4.6     wt     36         r4       old
ddsTc <- DESeqDataSet(x, design= ~design$strain + design$time.m + design$strain:design$time.m)

Error in (function (classes, fdef, mtable)  :
  unable to find an inherited method for function ‘assays’ for signature ‘"data.frame"’

ADD REPLYlink written 4.4 years ago by bioinforupesh2009.au100

Please read help(DESeqDataSet), noting in particular the DESeqDataSetFromMatrix() function.

ADD REPLYlink written 4.4 years ago by Devon Ryan88k

##countData; i called edgeR's DGEList bcz got error in counts function of DESEQ2

countData <- DGEList(counts=x,genes=x[,1],)

dds <- DESeqDataSetFromMatrix(countData= countData, colData =  colData, design = ~ Time + label + Time:label)

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
  contrasts can be applied only to factors with 2 or more level


However, i used only Time

dds <- DESeqDataSetFromMatrix(countData= countData, colData =  colData, design = ~ Time)

ddsLRT= DESeq (dds, test="LRT", reduced =  ~Time)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
NOTE: fitType='parametric', but the dispersion trend was not well captured by the
  function: y = a/x + b, and a local regression fit was automatically substituted.
  specify fitType='local' or 'mean' to avoid this message next time.
final dispersion estimates
fitting model and testing
Error in nbinomLRT(object, full = full, reduced = reduced, betaPrior = betaPrior,  :
  less than one degree of freedom, perhaps full and reduced models are not in the correct order


?? why it happened ?

ADD REPLYlink written 4.4 years ago by bioinforupesh2009.au100
1

It appears you're trying to get through an analysis without reading any of the documentation for the software. When you encounter an error there is a bit of troubleshooting you should do on your part. For one thing read the help for the function that gave an error, here that means type ?DESeq into the R console and read the meaning of the full and reduced formula. 

The help files tell you what kind of object each argument is asking for. By typing class(x) you can see if 'x' is the appropriate object to be giving for each argument.

ADD REPLYlink written 4.4 years ago by Michael Love1.8k

I seriously doubt that using a DGElist as the input matrix will work properly. Also, you need to show what colData is.

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Devon Ryan88k

wait i will update it soon, i got it. Thanks for advices.

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by bioinforupesh2009.au100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1243 users visited in the last hour