Model matrix for limma voom
1
0
Entering edit mode
6.6 years ago
ioannis ▴ 50

Hello community,

I have a large data set called "DWsub". It looks like this:

> head(DWsub, n=3)
                   CDx  CWx  D53x  D63x  D66x D68x  D70x D72x W53x W63x W66x W68x W70x W72x
NC_013663.1-295     10    6   125  699     73   48    69    3 307    29   13   33    5   34
NC_013663.1-552   7797 4786  7853  3934  5869 3469  5327 2702 2055 1922 1138 2691 4221 2699
NC_013663.1-553      5    2    16    21    22   21     9    7    4    3    6    9   15    8

I am using the package limma and voom to proceed with the analysis.

DWvoom <- voom(DWsub, plot=TRUE)

Now I need to create a multivariate model matrix. To separate the C, D and W samples I am using this:

> samples <- factor(sub("^(.).*", "\\1", colnames(DWvoom)))
> samples
 [1] C C D D D D D D W W W W W W
Levels: C D W

Now I need to add a "lane" factor because the samples have been grouped and sequenced in different sequencing lanes except the controls. The controls (CDx and CWx) have been sequenced in all three lanes so they should be included within all lane-levels.

Can anyone help with this factor? The factor "lane" should have 3 levels.

D53x, W53x, D63x, D63x, CDx, CWx = Lane 1

D66x, W66x, D68x, W68x, CDx, CWx = Lane 2

D70x, W70x, D72x, W72x, CDx, DWx = Lane 3

I have tried to rename the columns of "DWsub" to Lane1D53x, Lane1D63x, Lane1W53x, Lane1W63x) , Lane2D66x, Lane2D68x, Lane2W66x, Lane2W68x) , Lane3D70x, Lane3D72x, Lane3W70x, Lane3W72x) and play with the characters but my problem is the controls. I can not include them in all levels of the factor "lanes".

At the end I want to get this:

des <- model.matrix(~ 0 + samples + lanes)

Any ideas? Thanks for your time!

limma model.matrix • 2.3k views
ADD COMMENT
0
Entering edit mode
6.6 years ago

While this isn't a direct answer to your question, it should certainly help you in future. Often with these kinds of datasets, there are multiple variables that need to be considered and the easiest way to ensure that this is correct and flexible, is to design a phenotype table. A phenotype table is in the same order as the columns in your count matrix, where each column is a different variable of interest, so in your case Lane and sample class. You can make these quite simply with something like Excel, or procedurally in R, whatever you like really, it just makes life a lot easier as these experiments get bigger.

ADD COMMENT

Login before adding your answer.

Traffic: 2370 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6