Question: Compare gene expression in randomized experiment
1
gravatar for AR1
5.1 years ago by
AR110
Sweden
AR110 wrote:

Hi

I have 60 individuals randomized to treatment with a statin or diet. Statins are drugs that inhibit the liver enzyme HMG-CoA-reductase, with inhibits the synthesis of cholesterol (a major driver of atherosclerosis) leading to lower risk of stroke and myocardial infarction.

Thus 60 individuals are randomized to either a statin or diet. 10 genes were quantified at baseline and after 1 year treatment. For each gene we have a baseline value, 1 year value and the fold (calculated as the 1 year value divided by the baseline value).

The data set looks like (presenting two genes with baseline, 1 year and fold):

 genes <- read.table(header=TRUE, sep=";", text = 
    "treatment;IL10_BL;IL10_1Y;IL10_fold;IL6_BL;IL6_1Y;IL6_fold;
    diet;1.1;1.5;1.4;1.4;1.4;1.1;
    statin;2.5;3.3;1.3;2.7;3.1;1.1;
    statin;3.2;4.0;1.3;1.5;1.6;1.1;
    diet;3.8;4.4;1.2;3.0;2.9;0.9;
    statin;1.1;3.1;2.8;1.0;1.0;1.0;
    diet;3.0;6.0;2.0;2.0;1.0;0.5;")

Nothing has been done to the data (e.g log-transform).

I have the following questions:

• How would you calculate if there are differences in gene expression after 1 year in the diet vs statin group?

• Does change in gene expression predict other variables (e.g blood cholesterol); this would need multiple regression analyses, in which I wonder what gene value to use (the baseline, the 1 year of the fold). Should I in that case log-transform the fold/baseline/1year before using it as a predictor?

Any help is much appreciated.

I have read the limma userguide package but my data is not set up in the way limma package uses in the vignette.

Thanks in advanced.

ADD COMMENTlink modified 5.0 years ago • written 5.1 years ago by AR110

I have still not found out how to solve the issue... unfortunately.

Anyone who can provide som assistance?

Thanks in advance

ADD REPLYlink written 5.0 years ago by AR110

Where are you stuck now? BTW, you should really consider just collaborating with a local bioinformatician/statistician, since even once you're able to get everything to work I expect you won't really know how to QC everything.

ADD REPLYlink written 5.0 years ago by Devon Ryan91k

I have made an effort to read the user guide accompanying limma package, before posting here again.

My data set is set up the same way, as the "eset" object in my reply below. Therefore:
• Each row represents a gene.
• The first row, however, is the treatment assignment.
• Each column represents a patient. All rows, except from the first (which is the treatment) are values for gene expression.

I have tried a range of model matrices and subsequently analyze if there are differential gene expressions between the treatment groups. None of them have worked. I do not argue that my efforts are qualitative, since I'm not used to analyze gene data. I have no biostatician in my vicinity to ask.

I'd like to ask for some coding support here? I have the above data set and wish to examine if gene expression differs between the groups.

I would appreciate it immensely.
 

ADD REPLYlink written 5.0 years ago by AR110

(1) Delete the first row. Put you column:group assignments in a different file (put them in the same order, of course). (2) What model matrices have you tried and what exactly do you mean by "none of them have worked"?

ADD REPLYlink written 5.0 years ago by Devon Ryan91k

(1) done that now.
(2) A somewhat comparable example in the vignette where African males and females are compared uses the following matrix:
design <- model.matrix(~gender)
I did not figure out how to apply this to my data.

I also tried:
design=cbind(statin=c(1,1,0,0), diet=c(0,0,1,1))
and several other similar designs.


The problem is that
(a) i don't know what matrix to design, and 
(b) how to use the expression data set, the list created in (1) and the design matrix. Because that gives me three objects to work with, while the vignette always includes a design matrix and data.

Regards

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by AR110

The list of sample associations is used by model.matrix(). In short, you make a dataframe whose rows are samples and columns are parameters (diet, individual, etc.). Unless you know what you're doing, which would require having taken linear algebra at some point in your education, it's probably not a good idea to make random model matrices by hand (i.e., use model.matrix()). Your design is just ~diet or ~statin, depending on what you want to call things. If you look at section 9.7 of the users guide, you'll see an example of an experiment similar to (but more complicated than) yours. In this example, the dataframe or vector created in step (1) would be targets. You'd use that to create the design matrix.

ADD REPLYlink written 5.0 years ago by Devon Ryan91k
1
gravatar for Devon Ryan
5.1 years ago by
Devon Ryan91k
Freiburg, Germany
Devon Ryan91k wrote:

Just because your data isn't set up in a classical boiler-plate format for limma doesn't mean you can't use it for that. The "fold-change" values can be completely ignored regardless of the method you choose to use and then reformatting things for limma should be straight-forward enough (you just need a matrix with rows as genes and columns as samples, in which case the individual timepoints would count as different samples). I would be surprised if there's not an example of such a design in the limma user's guide (it'll probably involve duplicateCorrelation()).

For your second question, you can end up doing things the same way. If you wanted to use limma for that then you'd just perform multiple iterations of testing. If not, you can always use the MAANOVA package in R, which I think supports mixed-effect models as well (assuming you don't want to just use a fixed-effects factor for patients, which is certainly algorithmically simpler).

ADD COMMENTlink written 5.1 years ago by Devon Ryan91k

Hi Devon,

Thank you for taking time to explain this. The Limma package, although extensive, does not contain info from which I can derive a solution to my problem. I've read the similar thread:

Limma Package For Differential Expression Between Control And Stress

Without any success and this might explain why:

Following Your advice:

Importing the data:

genes <- read.table(header=TRUE, sep=";", text = 
                      "treatment;IL10_BL;IL10_1Y;IL10_fold;IL6_BL;IL6_1Y;IL6_fold;
    diet;1.1;1.5;1.4;1.4;1.4;1.1;
    statin;2.5;3.3;1.3;2.7;3.1;1.1;
    statin;3.2;4.0;1.3;1.5;1.6;1.1;
    diet;3.8;4.4;1.2;3.0;2.9;0.9;
    statin;1.1;3.1;2.8;1.0;1.0;1.0;
    diet;3.0;6.0;2.0;2.0;1.0;0.5;")

#Dont know where the column X came from; deleting it
genes[,"X"] = NULL

#Creating the eset dataset, which is to be analysed

eset <- data.frame(t(genes))

#Following advice from link above

design=cbind(statin=c(1,1,0,0), diet=c(0,0,1,1))
fit <- lmFit(eset, design)

#Yields the error:
Error in rowMeans(y$exprs, na.rm = TRUE) : 'x' must be numeric

 

If possible, I would appreciate some more guidance.

Much appreciated.

ADD REPLYlink written 5.1 years ago by AR110

Note the first column of genes is character array. Remove that, convert to numeric and you'll get around that error. Of course the design matrix makes no sense for your dataset (see the model.matrix() function).

ADD REPLYlink written 5.1 years ago by Devon Ryan91k

Thanks for the tip. Perhaps I should have declared I'm a novice =).

ADD REPLYlink modified 5.1 years ago • written 5.1 years ago by AR110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 793 users visited in the last hour