**20**wrote:

Hello All

I have 2 mammalian cell lines: A and B. A is subjected to 3 treatment conditions X,Y,Z. B is subjected to treatment conditions X and Y. So my RNA-Seq data has counts from AX, AY, AZ, BX and BY. I want to perform anova analysis to screen for differentially regulated genes.

I am trying to analyze my data using edgeR by following the users guide. Since I am completely new to this, it is not very intuitive to me. I want to make a model matrix to perform the anova-like comparison test to screen for genes that are different across all 5 conditions.

My code is below: Commands are in **BOLD.**

#count data

AX1 AX2 AY1 AY2 AZ1 AZ2 BX1 BX2 BY1 BY2

rna22510 0 0 0 0 0 0 0 0 0 0

rna22511 0 0 0 0 0 0 0 0 0 0

rna22512 0 0 0 0 0 0 0 0 0 0

rna22513 0 0 0 0 0 0 0 0 0 0

rna22514 0 0 0 0 0 0 0 0 0 0

rna22515 0 0 0 0 0 0 0 0 0 0

## #I normalize the data and calculate different dispersions. But I feel my main issue is trying to define the matrix to suit my needs.

**> design<-model.matrix(~0+group, data=cds$samples)
> design**

#cds is a DGEList object.

#output

groupH_05 groupH_10 groupP_00 groupP_05 groupP_10

P_10 1 0 0 0 0 1

P_10 2 0 0 0 0 1

P_05 1 0 0 0 1 0

P_05 2 0 0 0 1 0

P_00 1 0 0 1 0 0

P_00 2 0 0 1 0 0

H_10 1 0 1 0 0 0

H_10 2 0 1 0 0 0

H_05 1 1 0 0 0 0

H_05 2 1 0 0 0 0

**> cds<-estimateGLMCommonDisp(cds,design)**

**> cds<-estimateGLMTrendedDisp(cds,design)
Loading required package: splines
> cds<-estimateGLMTagwiseDisp(cds,design)
> fit2<-glmFit(cds,design)
> lrt.anova_like<-glmLRT(fit2,coef=2:5)**

**> lrt.anova_like**

An object of class "DGELRT"

$coefficients

groupH_05 groupH_10 groupP_00 groupP_05 groupP_10

gene1 -12.722907 -12.231696 -11.460983 -11.881307 -11.863651

rna1 -10.723691 -11.950632 -11.671753 -11.126594 -11.035141

rna2 -9.496628 -9.295025 -9.801454 -9.722280 -9.247496

rna3 -8.249089 -8.176574 -8.175494 -8.220004 -8.093095

rna4 -10.855702 -10.826411 -10.816899 -11.020586 -10.697726

12745 more rows ...

**> top<-topTags(lrt.anova_like)**

**> top**

Coefficient: groupH_10 groupP_00 groupP_05 groupP_10

logFC.groupH_10 logFC.groupP_00 logFC.groupP_05 logFC.groupP_10

rna111 -26.24556 -19.80748 -21.22361 -21.59235

rna366 -26.24556 -16.69825 -17.77939 -17.69264

rna498 -26.24556 -18.59508 -19.06105 -19.82423

rna735 -26.24556 -17.55049 -26.24556 -26.24556

rna1587 -26.24556 -17.97298 -26.24556 -26.24556

rna2000 -26.24556 -14.89324 -26.24556 -26.24556

gene2467 -26.24556 -23.90012 -19.09330 -26.24556

rna2338 -26.24556 -20.38091 -21.21060 -20.19601

rna2962 -26.24556 -21.82490 -21.59319 -22.53018

rna3147 -26.24556 -19.25034 -18.93527 -21.38125

logCPM LR PValue FDR

rna111 -0.76528954 9870.499 0 0

rna366 2.06883064 36241.154 0 0

rna498 0.34196368 16723.335 0 0

rna735 0.31486174 3362.217 0 0

rna1587 -0.06429664 3358.159 0 0

rna2000 2.74046265 2598.921 0 0

gene2467 -0.69390328 13430.350 0 0

rna2338 -0.79058544 2893.519 0 0

rna2962 0.06280098 3613.531 0 0

rna3147 0.05144112 10288.125 0 0

## In the above output, the P-value of the toptags is zero and even the FDR is zero which makes me think that my output is wrong.

I would appreciate any help that you could provide me.

Thanks

**92k**• written 5.5 years ago by ygowtha •

**20**

Your paramaterization is mostly useful for looking at contrasts, which you don't seem to be doing. The p-value will be ~0 for anything that's expressed, which is unlikely to be the question you want to answer.

92kThank you for your reply Devon. The reason I was worried about p-value is because I have noticed really small p-values but not a zero. Why do you say that I am not doing contrasts, I thought

glmLRT(fit2, coef=2:5)was supposed to do that.Could you also suggest a method/changes that I can make to my code to obtain anova like analysis. Basically I want to look for genes that are differentially expressed without specifying the groups beforehand (like mentioned in the users guide). And it looks like it based on making contrasts too.

I am very new to programming and bioinformatics, so any help would be appreciated.

20I missed the

`glmLRT(fit2, coef=2:5)`

line, but that's probably not telling you anything of biological interest. Why don't you just say the biological question that you wish to address?Edit: Actually, I think you need an intercept column for`glmLRT(fit2, coef=2:5)`

to make sense. Otherwise, you probably do end up testing expression.Edit2: In fact, this is mentioned in the edgeR UsersGuide, section 3.2.6. Regarding a 1-way ANOVA-like test, "... this is done by specifying multiple coefficients to glmLRT, when the design matrix includes an intercept term."92kSure. So basically I have two chinese hamster ovary cells lines. I am trying to study the transcriptome changes during serum-free adaptation. Typically 10% serum is added to media to mammalian cell cultures and they have to be adapted to serum-free media. So I identified three different serum concentrations where I want to study RNA of the cells - 10%, 5% and 0%. Only one cell line (A) was able to adapt completely, hence the other cell line(B) has only two different serum concentrations (10%, 5%).

What I was thinking was to first do an ANOVA analysis to see genes that are differentially regulated since doing a pairwaise comparison initially would generate a lot of false positives. After the anova analysis I would like to do pairwise comparison between meaningful conditions.

20