Question: how to find genes whose expressions are likely affected by interaction at any time in edgeR?
0
gravatar for calvin99
15 days ago by
calvin990
shanghai, china
calvin990 wrote:

Hi there,

Updated on 2021-02-23 to clarify my question.

I have a question about finding genes showing any sign of interaction between 2 factors: time and genotype.

So here are these 2 factors:

> table( genotype)
genotype
P1ko   Wt 
  18   18
> table( time)
time
 0h 26h 48h  d3  d6  d9 
  6   6   6   6   6   6

According to "edgeRUsersGuide.pdf"'s "3.3.4 Interaction at any time", one can use a factorial model to conduct an overall test for interaction. So here is the design matrix:

> design <- model.matrix( ~ genotype * time)
> colnames( design) %<>% { sub( '^(genotype|time)', '', ., perl = 1)}
> colnames( design) %<>% { sub( '(?<=:)time', '', ., perl = 1)}
> rownames( design) <- colnames( X) # X: the expression matrix
> design
          (Intercept) Wt 26h 48h d3 d6 d9 Wt:26h Wt:48h Wt:d3 Wt:d6 Wt:d9
P1ko1-0h            1  0   0   0  0  0  0      0      0     0     0     0
P1ko1-26h           1  0   1   0  0  0  0      0      0     0     0     0
...

And there are 5 columns for coefficients of interaction:

> grep( ':', colnames( design))
[1]  8  9 10 11 12
> grep( ':', colnames( design), value = 1)
[1] "Wt:26h" "Wt:48h" "Wt:d3"  "Wt:d6"  "Wt:d9"

Following the instruction, I did this:

> ...
> glmQLFTest( fit, coef = grep( ':', colnames( design)))
> ...

This gives me 645 significant (FDR < 0.05) genes.

But if I loop over all kinds of combinations of these 5 coefficients

> local({
>   coef <- grep( ':', colnames( design), value = 1)
>   lapply( seq_along( coef), function( n1) {
>     combn( coef, n1)
>   })
> })
[[1]]
     [,1]     [,2]     [,3]    [,4]    [,5]   
[1,] "Wt:26h" "Wt:48h" "Wt:d3" "Wt:d6" "Wt:d9"

[[2]]
     [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]   
[1,] "Wt:26h" "Wt:26h" "Wt:26h" "Wt:26h" "Wt:48h" "Wt:48h" "Wt:48h" "Wt:d3"
[2,] "Wt:48h" "Wt:d3"  "Wt:d6"  "Wt:d9"  "Wt:d3"  "Wt:d6"  "Wt:d9"  "Wt:d6"
     [,9]    [,10]  
[1,] "Wt:d3" "Wt:d6"
[2,] "Wt:d9" "Wt:d9"

[[3]]
     [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]    
[1,] "Wt:26h" "Wt:26h" "Wt:26h" "Wt:26h" "Wt:26h" "Wt:26h" "Wt:48h" "Wt:48h"
[2,] "Wt:48h" "Wt:48h" "Wt:48h" "Wt:d3"  "Wt:d3"  "Wt:d6"  "Wt:d3"  "Wt:d3" 
[3,] "Wt:d3"  "Wt:d6"  "Wt:d9"  "Wt:d6"  "Wt:d9"  "Wt:d9"  "Wt:d6"  "Wt:d9" 
     [,9]     [,10]  
[1,] "Wt:48h" "Wt:d3"
[2,] "Wt:d6"  "Wt:d6"
[3,] "Wt:d9"  "Wt:d9"

[[4]]
     [,1]     [,2]     [,3]     [,4]     [,5]    
[1,] "Wt:26h" "Wt:26h" "Wt:26h" "Wt:26h" "Wt:48h"
[2,] "Wt:48h" "Wt:48h" "Wt:48h" "Wt:d3"  "Wt:d3" 
[3,] "Wt:d3"  "Wt:d3"  "Wt:d6"  "Wt:d6"  "Wt:d6" 
[4,] "Wt:d6"  "Wt:d9"  "Wt:d9"  "Wt:d9"  "Wt:d9" 

[[5]]
     [,1]    
[1,] "Wt:26h"
[2,] "Wt:48h"
[3,] "Wt:d3" 
[4,] "Wt:d6" 
[5,] "Wt:d9"

I got 1088 significant genes:

> deg <- local({
>   coef <- grep( ':', colnames( design), perl = 1)
>   q.co <- .05
>   deg <- lapply( seq_along( coef), function( n1) {
>     apply( combn( coef, n1), 2, function( coef) {
>       de1 <- glmQLFTest( fit, coef) %>% topTags( Inf, 'fdr') %>% as.data.frame %>% as.data.table( 'gene.id_ensembl')
>       de1[ FDR < q.co, gene.id_ensembl]
>     })
>   })
>   unique( unlist( deg))
> })
> length( deg)
[1] 1088

So I am confused which way is better doing my job.


Started on 2021-02-20.

If I want to find genes showing any sign of interaction between time and genotype, could I simply do this: glmQLFTest(fit, coef="interaction column indices"), as said in "edgeRUsersGuide.pdf"'s "3.3.4 Interaction at any time"?

I have two genotypes and six time points. So there are five interaction columns in the design matrix. But when I looped over all combinations of the interaction column indices, i.e. five 1-index combinations plus ten 2-indices combinations plus ... and finally plus one 5-indices combination, and collect genes with FDR < 0.05 in each loop, I found that the overall unique genes with FDR < 0.05 from the looping are more abundant than that of the one 5-indices combination.

So I am confused which way is better doing my job.

Any suggestion?

Thanks a lot.

This question can also be seen at https://support.bioconductor.org/p/9134935.

edger • 159 views
ADD COMMENTlink modified 12 days ago • written 15 days ago by calvin990

I cannot follow, and given that no one commented at Bioc they probably cannot as well. Please add some kind of illustration what the problem is. What is five 1-index combinations plus ten 2-indices combinations plus? Can you add the model matrix and explain (together with some code, please highlight with 10101 button) what the issue is?

ADD REPLYlink written 13 days ago by ATpoint46k

Hi AT,

thanks for indication of need of clarification of my question. I have modified the content.

ADD REPLYlink written 12 days ago by calvin990

Those indices are coming from design matrix (column names). You can do following:

  1. Relevel the genotypes (make wild type or appropriate genotype as your reference for comparison)
  2. Create design matrix . Replace Treat with genotype eg. model.matrix(~genotype * time, data=<phenotype_data_dataframe>)

  3. Print the column names from design matrix.

  4. Use the index/indices of interest from design matrix as 3.3.3
ADD REPLYlink modified 13 days ago • written 13 days ago by cpad011215k

Hi, I have modified the question. Please have a look. Thanks.

ADD REPLYlink modified 12 days ago • written 12 days ago by calvin990
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1326 users visited in the last hour
_