Updated on 2021-02-23 to clarify my question.

Question

how to find genes whose expressions are likely affected by interaction at any time in edgeR?

0

Entering edit mode

4.4 years ago

calvin99 • 0

Hi there,

Updated on 2021-02-23 to clarify my question.

I have a question about finding genes showing any sign of interaction between 2 factors: time and genotype.

So here are these 2 factors:

> table( genotype)
genotype
P1ko   Wt 
  18   18
> table( time)
time
 0h 26h 48h  d3  d6  d9 
  6   6   6   6   6   6

According to "edgeRUsersGuide.pdf"'s "3.3.4 Interaction at any time", one can use a factorial model to conduct an overall test for interaction. So here is the design matrix:

> design <- model.matrix( ~ genotype * time)
> colnames( design) %<>% { sub( '^(genotype|time)', '', ., perl = 1)}
> colnames( design) %<>% { sub( '(?<=:)time', '', ., perl = 1)}
> rownames( design) <- colnames( X) # X: the expression matrix
> design
          (Intercept) Wt 26h 48h d3 d6 d9 Wt:26h Wt:48h Wt:d3 Wt:d6 Wt:d9
P1ko1-0h            1  0   0   0  0  0  0      0      0     0     0     0
P1ko1-26h           1  0   1   0  0  0  0      0      0     0     0     0
...

And there are 5 columns for coefficients of interaction:

> grep( ':', colnames( design))
[1]  8  9 10 11 12
> grep( ':', colnames( design), value = 1)
[1] "Wt:26h" "Wt:48h" "Wt:d3"  "Wt:d6"  "Wt:d9"

Following the instruction, I did this:

> ...
> glmQLFTest( fit, coef = grep( ':', colnames( design)))
> ...

This gives me 645 significant (FDR < 0.05) genes.

But if I loop over all kinds of combinations of these 5 coefficients

> local({
>   coef <- grep( ':', colnames( design), value = 1)
>   lapply( seq_along( coef), function( n1) {
>     combn( coef, n1)
>   })
> })
[[1]]
     [,1]     [,2]     [,3]    [,4]    [,5]   
[1,] "Wt:26h" "Wt:48h" "Wt:d3" "Wt:d6" "Wt:d9"

[[2]]
     [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]   
[1,] "Wt:26h" "Wt:26h" "Wt:26h" "Wt:26h" "Wt:48h" "Wt:48h" "Wt:48h" "Wt:d3"
[2,] "Wt:48h" "Wt:d3"  "Wt:d6"  "Wt:d9"  "Wt:d3"  "Wt:d6"  "Wt:d9"  "Wt:d6"
     [,9]    [,10]  
[1,] "Wt:d3" "Wt:d6"
[2,] "Wt:d9" "Wt:d9"

[[3]]
     [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]    
[1,] "Wt:26h" "Wt:26h" "Wt:26h" "Wt:26h" "Wt:26h" "Wt:26h" "Wt:48h" "Wt:48h"
[2,] "Wt:48h" "Wt:48h" "Wt:48h" "Wt:d3"  "Wt:d3"  "Wt:d6"  "Wt:d3"  "Wt:d3" 
[3,] "Wt:d3"  "Wt:d6"  "Wt:d9"  "Wt:d6"  "Wt:d9"  "Wt:d9"  "Wt:d6"  "Wt:d9" 
     [,9]     [,10]  
[1,] "Wt:48h" "Wt:d3"
[2,] "Wt:d6"  "Wt:d6"
[3,] "Wt:d9"  "Wt:d9"

[[4]]
     [,1]     [,2]     [,3]     [,4]     [,5]    
[1,] "Wt:26h" "Wt:26h" "Wt:26h" "Wt:26h" "Wt:48h"
[2,] "Wt:48h" "Wt:48h" "Wt:48h" "Wt:d3"  "Wt:d3" 
[3,] "Wt:d3"  "Wt:d3"  "Wt:d6"  "Wt:d6"  "Wt:d6" 
[4,] "Wt:d6"  "Wt:d9"  "Wt:d9"  "Wt:d9"  "Wt:d9" 

[[5]]
     [,1]    
[1,] "Wt:26h"
[2,] "Wt:48h"
[3,] "Wt:d3" 
[4,] "Wt:d6" 
[5,] "Wt:d9"

I got 1088 significant genes:

> deg <- local({
>   coef <- grep( ':', colnames( design), perl = 1)
>   q.co <- .05
>   deg <- lapply( seq_along( coef), function( n1) {
>     apply( combn( coef, n1), 2, function( coef) {
>       de1 <- glmQLFTest( fit, coef) %>% topTags( Inf, 'fdr') %>% as.data.frame %>% as.data.table( 'gene.id_ensembl')
>       de1[ FDR < q.co, gene.id_ensembl]
>     })
>   })
>   unique( unlist( deg))
> })
> length( deg)
[1] 1088

So I am confused which way is better doing my job.

Started on 2021-02-20.

If I want to find genes showing any sign of interaction between time and genotype, could I simply do this: glmQLFTest(fit, coef="interaction column indices"), as said in "edgeRUsersGuide.pdf"'s "3.3.4 Interaction at any time"?

I have two genotypes and six time points. So there are five interaction columns in the design matrix. But when I looped over all combinations of the interaction column indices, i.e. five 1-index combinations plus ten 2-indices combinations plus ... and finally plus one 5-indices combination, and collect genes with FDR < 0.05 in each loop, I found that the overall unique genes with FDR < 0.05 from the looping are more abundant than that of the one 5-indices combination.

So I am confused which way is better doing my job.

Any suggestion?

Thanks a lot.

This question can also be seen at https://support.bioconductor.org/p/9134935.

edgeR • 1.2k views

ADD COMMENT • link 4.4 years ago by calvin99 • 0

0

Entering edit mode

I cannot follow, and given that no one commented at Bioc they probably cannot as well. Please add some kind of illustration what the problem is. What is five 1-index combinations plus ten 2-indices combinations plus? Can you add the model matrix and explain (together with some code, please highlight with 10101 button) what the issue is?

ADD REPLY • link 4.4 years ago by ATpoint 88k

0

Entering edit mode

Hi AT,

thanks for indication of need of clarification of my question. I have modified the content.

ADD REPLY • link 4.4 years ago by calvin99 • 0

0

Entering edit mode

Those indices are coming from design matrix (column names). You can do following:

Relevel the genotypes (make wild type or appropriate genotype as your reference for comparison)
Create design matrix . Replace Treat with genotype eg. model.matrix(~genotype * time, data=<phenotype_data_dataframe>)
Print the column names from design matrix.
Use the index/indices of interest from design matrix as 3.3.3