how to find genes whose expressions are likely affected by interaction at any time in edgeR?
0
0
Entering edit mode
3.2 years ago
calvin99 • 0

Hi there,

Updated on 2021-02-23 to clarify my question.

I have a question about finding genes showing any sign of interaction between 2 factors: time and genotype.

So here are these 2 factors:

> table( genotype)
genotype
P1ko   Wt 
  18   18
> table( time)
time
 0h 26h 48h  d3  d6  d9 
  6   6   6   6   6   6

According to "edgeRUsersGuide.pdf"'s "3.3.4 Interaction at any time", one can use a factorial model to conduct an overall test for interaction. So here is the design matrix:

> design <- model.matrix( ~ genotype * time)
> colnames( design) %<>% { sub( '^(genotype|time)', '', ., perl = 1)}
> colnames( design) %<>% { sub( '(?<=:)time', '', ., perl = 1)}
> rownames( design) <- colnames( X) # X: the expression matrix
> design
          (Intercept) Wt 26h 48h d3 d6 d9 Wt:26h Wt:48h Wt:d3 Wt:d6 Wt:d9
P1ko1-0h            1  0   0   0  0  0  0      0      0     0     0     0
P1ko1-26h           1  0   1   0  0  0  0      0      0     0     0     0
...

And there are 5 columns for coefficients of interaction:

> grep( ':', colnames( design))
[1]  8  9 10 11 12
> grep( ':', colnames( design), value = 1)
[1] "Wt:26h" "Wt:48h" "Wt:d3"  "Wt:d6"  "Wt:d9"

Following the instruction, I did this:

> ...
> glmQLFTest( fit, coef = grep( ':', colnames( design)))
> ...

This gives me 645 significant (FDR < 0.05) genes.

But if I loop over all kinds of combinations of these 5 coefficients

> local({
>   coef <- grep( ':', colnames( design), value = 1)
>   lapply( seq_along( coef), function( n1) {
>     combn( coef, n1)
>   })
> })
[[1]]
     [,1]     [,2]     [,3]    [,4]    [,5]   
[1,] "Wt:26h" "Wt:48h" "Wt:d3" "Wt:d6" "Wt:d9"

[[2]]
     [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]   
[1,] "Wt:26h" "Wt:26h" "Wt:26h" "Wt:26h" "Wt:48h" "Wt:48h" "Wt:48h" "Wt:d3"
[2,] "Wt:48h" "Wt:d3"  "Wt:d6"  "Wt:d9"  "Wt:d3"  "Wt:d6"  "Wt:d9"  "Wt:d6"
     [,9]    [,10]  
[1,] "Wt:d3" "Wt:d6"
[2,] "Wt:d9" "Wt:d9"

[[3]]
     [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]    
[1,] "Wt:26h" "Wt:26h" "Wt:26h" "Wt:26h" "Wt:26h" "Wt:26h" "Wt:48h" "Wt:48h"
[2,] "Wt:48h" "Wt:48h" "Wt:48h" "Wt:d3"  "Wt:d3"  "Wt:d6"  "Wt:d3"  "Wt:d3" 
[3,] "Wt:d3"  "Wt:d6"  "Wt:d9"  "Wt:d6"  "Wt:d9"  "Wt:d9"  "Wt:d6"  "Wt:d9" 
     [,9]     [,10]  
[1,] "Wt:48h" "Wt:d3"
[2,] "Wt:d6"  "Wt:d6"
[3,] "Wt:d9"  "Wt:d9"

[[4]]
     [,1]     [,2]     [,3]     [,4]     [,5]    
[1,] "Wt:26h" "Wt:26h" "Wt:26h" "Wt:26h" "Wt:48h"
[2,] "Wt:48h" "Wt:48h" "Wt:48h" "Wt:d3"  "Wt:d3" 
[3,] "Wt:d3"  "Wt:d3"  "Wt:d6"  "Wt:d6"  "Wt:d6" 
[4,] "Wt:d6"  "Wt:d9"  "Wt:d9"  "Wt:d9"  "Wt:d9" 

[[5]]
     [,1]    
[1,] "Wt:26h"
[2,] "Wt:48h"
[3,] "Wt:d3" 
[4,] "Wt:d6" 
[5,] "Wt:d9"

I got 1088 significant genes:

> deg <- local({
>   coef <- grep( ':', colnames( design), perl = 1)
>   q.co <- .05
>   deg <- lapply( seq_along( coef), function( n1) {
>     apply( combn( coef, n1), 2, function( coef) {
>       de1 <- glmQLFTest( fit, coef) %>% topTags( Inf, 'fdr') %>% as.data.frame %>% as.data.table( 'gene.id_ensembl')
>       de1[ FDR < q.co, gene.id_ensembl]
>     })
>   })
>   unique( unlist( deg))
> })
> length( deg)
[1] 1088

So I am confused which way is better doing my job.


Started on 2021-02-20.

If I want to find genes showing any sign of interaction between time and genotype, could I simply do this: glmQLFTest(fit, coef="interaction column indices"), as said in "edgeRUsersGuide.pdf"'s "3.3.4 Interaction at any time"?

I have two genotypes and six time points. So there are five interaction columns in the design matrix. But when I looped over all combinations of the interaction column indices, i.e. five 1-index combinations plus ten 2-indices combinations plus ... and finally plus one 5-indices combination, and collect genes with FDR < 0.05 in each loop, I found that the overall unique genes with FDR < 0.05 from the looping are more abundant than that of the one 5-indices combination.

So I am confused which way is better doing my job.

Any suggestion?

Thanks a lot.

This question can also be seen at https://support.bioconductor.org/p/9134935.

edgeR • 923 views
ADD COMMENT
0
Entering edit mode

I cannot follow, and given that no one commented at Bioc they probably cannot as well. Please add some kind of illustration what the problem is. What is five 1-index combinations plus ten 2-indices combinations plus? Can you add the model matrix and explain (together with some code, please highlight with 10101 button) what the issue is?

ADD REPLY
0
Entering edit mode

Hi AT,

thanks for indication of need of clarification of my question. I have modified the content.

ADD REPLY
0
Entering edit mode

Those indices are coming from design matrix (column names). You can do following:

  1. Relevel the genotypes (make wild type or appropriate genotype as your reference for comparison)
  2. Create design matrix . Replace Treat with genotype eg. model.matrix(~genotype * time, data=<phenotype_data_dataframe>)

  3. Print the column names from design matrix.

  4. Use the index/indices of interest from design matrix as 3.3.3
ADD REPLY
0
Entering edit mode

Hi, I have modified the question. Please have a look. Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 3861 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6