Question: EdgeR matrix design and comparisons for paired samples
1
gravatar for silas008
5 months ago by
silas008140
Brazil
silas008140 wrote:

Hi guys,

I am a bit confused about the statistics for paired samples in edgeR.

I have 4 different treatments, A, B, C and D, each one with 4 samples. 2 of those samples are "before" treatment and the other 2 are "after" treatment.

If iam correct, checking the edgeR manual, the design of the model matrix should be:

> groups <- factor(targets$Group)
> treatment <- factor(targets$Treatment, levels=c("before","after"))
> design <- model.matrix(~groups+treatment)

But in the case I have a data that is a simple table containing the genes in the first column and de samples in the other columns, how can I construct the model matrix to accept this table format?

I think I can simple open the table as a matrix and atributte the factors to the samples:

> my_table <- data.matrix(my_table, row.names.default(my_table))
> groups <- factor(c(A1,A2,A3,A4,B1,B2,B3,B4,C1,C2,C3,C4,D1,D2,D3,D4))
> treatment <- factor(c("before", "before", "after", "after","before", "before", "after", "after","before", "before", "after", "after",))
> design <- design.matrix(~groups+treatment)
> y <- DGEList(counts=my_table, group=groups)

But I don't know if this is correct.

Does anyone can help with that, I'd really appreciate it.

Thanks

edger rna-seq • 269 views
ADD COMMENTlink modified 5 months ago by h.mon32k • written 5 months ago by silas008140
1
gravatar for h.mon
5 months ago by
h.mon32k
Brazil
h.mon32k wrote:

The "correct" way will depend on what A, B, C, D, before and after are, and on what you are interested to test, but it seems to me a better approach (not that what you did is wrong) in your case would be to create a factor combining both group and treatment

Group <- factor( paste( groups, treatment, sep = "." ) )
design <- design.matrix( ~ 0 + Group )
y <- DGEList( counts = my_table, group = Group )

I have 4 different treatments, A, B, C and D, each one with 4 samples. 2 of those samples are "before" treatment and the other 2 are "after" treatment.

If A, B, C and D, are treatments, why do you name the factor which describes them as group? And if before and after are time of sampling, why do you call this factor treatment instead of time? Adequately naming variables will make your code easier to understand.

ADD COMMENTlink written 5 months ago by h.mon32k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 944 users visited in the last hour