Hi there,
Updated on 2021-02-23 to clarify my question.
I have a question about finding genes showing any sign of interaction between 2 factors: time and genotype.
So here are these 2 factors:
> table( genotype)
genotype
P1ko Wt
18 18
> table( time)
time
0h 26h 48h d3 d6 d9
6 6 6 6 6 6
According to "edgeRUsersGuide.pdf"'s "3.3.4 Interaction at any time", one can use a factorial model to conduct an overall test for interaction. So here is the design matrix:
> design <- model.matrix( ~ genotype * time)
> colnames( design) %<>% { sub( '^(genotype|time)', '', ., perl = 1)}
> colnames( design) %<>% { sub( '(?<=:)time', '', ., perl = 1)}
> rownames( design) <- colnames( X) # X: the expression matrix
> design
(Intercept) Wt 26h 48h d3 d6 d9 Wt:26h Wt:48h Wt:d3 Wt:d6 Wt:d9
P1ko1-0h 1 0 0 0 0 0 0 0 0 0 0 0
P1ko1-26h 1 0 1 0 0 0 0 0 0 0 0 0
...
And there are 5 columns for coefficients of interaction:
> grep( ':', colnames( design))
[1] 8 9 10 11 12
> grep( ':', colnames( design), value = 1)
[1] "Wt:26h" "Wt:48h" "Wt:d3" "Wt:d6" "Wt:d9"
Following the instruction, I did this:
> ...
> glmQLFTest( fit, coef = grep( ':', colnames( design)))
> ...
This gives me 645 significant (FDR < 0.05) genes.
But if I loop over all kinds of combinations of these 5 coefficients
> local({
> coef <- grep( ':', colnames( design), value = 1)
> lapply( seq_along( coef), function( n1) {
> combn( coef, n1)
> })
> })
[[1]]
[,1] [,2] [,3] [,4] [,5]
[1,] "Wt:26h" "Wt:48h" "Wt:d3" "Wt:d6" "Wt:d9"
[[2]]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] "Wt:26h" "Wt:26h" "Wt:26h" "Wt:26h" "Wt:48h" "Wt:48h" "Wt:48h" "Wt:d3"
[2,] "Wt:48h" "Wt:d3" "Wt:d6" "Wt:d9" "Wt:d3" "Wt:d6" "Wt:d9" "Wt:d6"
[,9] [,10]
[1,] "Wt:d3" "Wt:d6"
[2,] "Wt:d9" "Wt:d9"
[[3]]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] "Wt:26h" "Wt:26h" "Wt:26h" "Wt:26h" "Wt:26h" "Wt:26h" "Wt:48h" "Wt:48h"
[2,] "Wt:48h" "Wt:48h" "Wt:48h" "Wt:d3" "Wt:d3" "Wt:d6" "Wt:d3" "Wt:d3"
[3,] "Wt:d3" "Wt:d6" "Wt:d9" "Wt:d6" "Wt:d9" "Wt:d9" "Wt:d6" "Wt:d9"
[,9] [,10]
[1,] "Wt:48h" "Wt:d3"
[2,] "Wt:d6" "Wt:d6"
[3,] "Wt:d9" "Wt:d9"
[[4]]
[,1] [,2] [,3] [,4] [,5]
[1,] "Wt:26h" "Wt:26h" "Wt:26h" "Wt:26h" "Wt:48h"
[2,] "Wt:48h" "Wt:48h" "Wt:48h" "Wt:d3" "Wt:d3"
[3,] "Wt:d3" "Wt:d3" "Wt:d6" "Wt:d6" "Wt:d6"
[4,] "Wt:d6" "Wt:d9" "Wt:d9" "Wt:d9" "Wt:d9"
[[5]]
[,1]
[1,] "Wt:26h"
[2,] "Wt:48h"
[3,] "Wt:d3"
[4,] "Wt:d6"
[5,] "Wt:d9"
I got 1088 significant genes:
> deg <- local({
> coef <- grep( ':', colnames( design), perl = 1)
> q.co <- .05
> deg <- lapply( seq_along( coef), function( n1) {
> apply( combn( coef, n1), 2, function( coef) {
> de1 <- glmQLFTest( fit, coef) %>% topTags( Inf, 'fdr') %>% as.data.frame %>% as.data.table( 'gene.id_ensembl')
> de1[ FDR < q.co, gene.id_ensembl]
> })
> })
> unique( unlist( deg))
> })
> length( deg)
[1] 1088
So I am confused which way is better doing my job.
Started on 2021-02-20.
If I want to find genes showing any sign of interaction between time and genotype, could I simply do this: glmQLFTest(fit, coef="interaction column indices"), as said in "edgeRUsersGuide.pdf"'s "3.3.4 Interaction at any time"?
I have two genotypes and six time points. So there are five interaction columns in the design matrix. But when I looped over all combinations of the interaction column indices, i.e. five 1-index combinations plus ten 2-indices combinations plus ... and finally plus one 5-indices combination, and collect genes with FDR < 0.05 in each loop, I found that the overall unique genes with FDR < 0.05 from the looping are more abundant than that of the one 5-indices combination.
So I am confused which way is better doing my job.
Any suggestion?
Thanks a lot.
This question can also be seen at https://support.bioconductor.org/p/9134935.
I cannot follow, and given that no one commented at Bioc they probably cannot as well. Please add some kind of illustration what the problem is. What is
five 1-index combinations plus ten 2-indices combinations plus
? Can you add the model matrix and explain (together with some code, please highlight with10101
button) what the issue is?Hi AT,
thanks for indication of need of clarification of my question. I have modified the content.
Those indices are coming from design matrix (column names). You can do following:
Create design matrix . Replace Treat with genotype eg.
model.matrix(~genotype * time, data=<phenotype_data_dataframe>)
Print the column names from design matrix.
Hi, I have modified the question. Please have a look. Thanks.