Question

R Model Matrix has extra columns

0

Entering edit mode

5.8 years ago

tpham2654 • 0

I am trying to do differential expression analysis on some RNA microarray data. I was setting up my model matrix for limma from a csv file which has info on the samples, specifically how they were to be grouped (cre/flox status). Some example data is below:

geo_name,cre_lox,cell_type,treatment1,replicate_num
sample1,flox,c1,no,1
sample2,flox,c1,no,2
sample3,flox,c1,no,3
sample4,cre,c1,no,1
sample5,cre,c1,no,2
sample6,cre,c1,no,3
sample7,wt,c2,no,1
sample8,wt,c2,yes,1

The subset of the data I want (where cell_type=c1) has only "cre" and "flox" in the "cre_lox" column.
I selected for it using:

q1a_selected_col_data = col_data[(col_data$cell_type == 'c1'),]

However, when I used the function model.matrix(~q1a_selected_col_data$cre_lox) it results in a matrix like this:

(Intercept) q1a_selected_col_data$cre_loxflox q1a_selected_col_data$cre_loxwt
1         1                                 1                               0
2         1                                 1                               0
3         1                                 1                               0
4         1                                 0                               0
5         1                                 0                               0
6         1                                 0                               0

How did it "know" to add a column for "wt" status even though the data I passed to it does not have "wt" in it? Is there a way I can prevent things like this without having to modify the csv or remove columns from the model matrix by hand?

R RNA-Seq Limma • 2.3k views

ADD COMMENT • link 5.8 years ago by tpham2654 • 0

score 1 · Answer 1 · 2018-07-13

1

Entering edit mode

5.8 years ago

russhh 5.7k

your crelox column is stored as a factor. model.matrix will automatically put in a column for all non-reference levels of a factor variable, even if there isn't a sample with a given factor level.

To mitigate against problems like this, you could use droplevels on your original dataframe

ADD COMMENT • link 5.8 years ago by russhh 5.7k

0

Entering edit mode

Thanks. I found that the design matrix is the inverse of what I want. Basically the 1 and 0 in the q1a_selected_col_data$cre_loxflox column should be switched since cre is the experimental group.

I tried model.matrix(~q1a_selected_col_data$cre_lox-1) after uing droplevels to invert it and now I get this:

 q1a_selected_col_data$cre_loxflox q1a_selected_col_data$cre_loxcre
1                                1                                0
2                                1                                0
3                                1                                0
4                                0                                1
5                                0                                1
6                                0                                1

I want the last column. Is there a way I can select the label model.matrix should mark as "1"?

ADD REPLY • link 5.8 years ago by tpham2654 • 0

0

Entering edit mode

relevel?

ADD REPLY • link 5.8 years ago by cpad0112 21k

0

Entering edit mode

it doesn't really make any difference (although it might make things simpler to reason about) since you specify the experimental comparisons in your contrasts matrix, not your design matrix

ADD REPLY • link 5.8 years ago by russhh 5.7k