Compare gene expression in randomized experiment
1
1
Entering edit mode
6.7 years ago
AR1 ▴ 10

Hi

I have 60 individuals randomized to treatment with a statin or diet. Statins are drugs that inhibit the liver enzyme HMG-CoA-reductase, with inhibits the synthesis of cholesterol (a major driver of atherosclerosis) leading to lower risk of stroke and myocardial infarction.

Thus 60 individuals are randomized to either a statin or diet. 10 genes were quantified at baseline and after 1 year treatment. For each gene we have a baseline value, 1 year value and the fold (calculated as the 1 year value divided by the baseline value).

The data set looks like (presenting two genes with baseline, 1 year and fold):

 genes <- read.table(header=TRUE, sep=";", text =
"treatment;IL10_BL;IL10_1Y;IL10_fold;IL6_BL;IL6_1Y;IL6_fold;
diet;1.1;1.5;1.4;1.4;1.4;1.1;
statin;2.5;3.3;1.3;2.7;3.1;1.1;
statin;3.2;4.0;1.3;1.5;1.6;1.1;
diet;3.8;4.4;1.2;3.0;2.9;0.9;
statin;1.1;3.1;2.8;1.0;1.0;1.0;
diet;3.0;6.0;2.0;2.0;1.0;0.5;")

Nothing has been done to the data (e.g log-transform).

I have the following questions:

• How would you calculate if there are differences in gene expression after 1 year in the diet vs statin group?

• Does change in gene expression predict other variables (e.g blood cholesterol); this would need multiple regression analyses, in which I wonder what gene value to use (the baseline, the 1 year of the fold). Should I in that case log-transform the fold/baseline/1year before using it as a predictor?

Any help is much appreciated.

I have read the limma userguide package but my data is not set up in the way limma package uses in the vignette.

gene expression randomized experiment • 2.8k views
0
Entering edit mode

I have still not found out how to solve the issue... unfortunately.

Anyone who can provide som assistance?

0
Entering edit mode

Where are you stuck now? BTW, you should really consider just collaborating with a local bioinformatician/statistician, since even once you're able to get everything to work I expect you won't really know how to QC everything.

0
Entering edit mode

I have made an effort to read the user guide accompanying limma package, before posting here again.

My data set is set up the same way, as the "eset" object in my reply below. Therefore:
• Each row represents a gene.
• The first row, however, is the treatment assignment.
• Each column represents a patient. All rows, except from the first (which is the treatment) are values for gene expression.

I have tried a range of model matrices and subsequently analyze if there are differential gene expressions between the treatment groups. None of them have worked. I do not argue that my efforts are qualitative, since I'm not used to analyze gene data. I have no biostatician in my vicinity to ask.

I'd like to ask for some coding support here? I have the above data set and wish to examine if gene expression differs between the groups.

I would appreciate it immensely.

0
Entering edit mode

(1) Delete the first row. Put you column:group assignments in a different file (put them in the same order, of course). (2) What model matrices have you tried and what exactly do you mean by "none of them have worked"?

0
Entering edit mode

(1) done that now.
(2) A somewhat comparable example in the vignette where African males and females are compared uses the following matrix:
design <- model.matrix(~gender)
I did not figure out how to apply this to my data.

I also tried:
design=cbind(statin=c(1,1,0,0), diet=c(0,0,1,1))
and several other similar designs.

The problem is that
(a) i don't know what matrix to design, and
(b) how to use the expression data set, the list created in (1) and the design matrix. Because that gives me three objects to work with, while the vignette always includes a design matrix and data.

Regards

0
Entering edit mode

The list of sample associations is used by model.matrix(). In short, you make a dataframe whose rows are samples and columns are parameters (diet, individual, etc.). Unless you know what you're doing, which would require having taken linear algebra at some point in your education, it's probably not a good idea to make random model matrices by hand (i.e., use model.matrix()). Your design is just ~diet or ~statin, depending on what you want to call things. If you look at section 9.7 of the users guide, you'll see an example of an experiment similar to (but more complicated than) yours. In this example, the dataframe or vector created in step (1) would be targets. You'd use that to create the design matrix.

1
Entering edit mode
6.7 years ago

Just because your data isn't set up in a classical boiler-plate format for limma doesn't mean you can't use it for that. The "fold-change" values can be completely ignored regardless of the method you choose to use and then reformatting things for limma should be straight-forward enough (you just need a matrix with rows as genes and columns as samples, in which case the individual timepoints would count as different samples). I would be surprised if there's not an example of such a design in the limma user's guide (it'll probably involve duplicateCorrelation()).

For your second question, you can end up doing things the same way. If you wanted to use limma for that then you'd just perform multiple iterations of testing. If not, you can always use the MAANOVA package in R, which I think supports mixed-effect models as well (assuming you don't want to just use a fixed-effects factor for patients, which is certainly algorithmically simpler).

0
Entering edit mode

Hi Devon,

Thank you for taking time to explain this. The Limma package, although extensive, does not contain info from which I can derive a solution to my problem. I've read the similar thread:

Limma Package For Differential Expression Between Control And Stress

Without any success and this might explain why:

Importing the data:

genes <- read.table(header=TRUE, sep=";", text =                        "treatment;IL10_BL;IL10_1Y;IL10_fold;IL6_BL;IL6_1Y;IL6_fold;     diet;1.1;1.5;1.4;1.4;1.4;1.1;     statin;2.5;3.3;1.3;2.7;3.1;1.1;     statin;3.2;4.0;1.3;1.5;1.6;1.1;     diet;3.8;4.4;1.2;3.0;2.9;0.9;     statin;1.1;3.1;2.8;1.0;1.0;1.0;     diet;3.0;6.0;2.0;2.0;1.0;0.5;")

#Dont know where the column X came from; deleting it
genes[,"X"] = NULL

#Creating the eset dataset, which is to be analysed

eset <- data.frame(t(genes))

design=cbind(statin=c(1,1,0,0), diet=c(0,0,1,1)) fit <- lmFit(eset, design)

#Yields the error:
Error in rowMeans(y\$exprs, na.rm = TRUE) : 'x' must be numeric

If possible, I would appreciate some more guidance.

Much appreciated.

0
Entering edit mode

Note the first column of genes is character array. Remove that, convert to numeric and you'll get around that error. Of course the design matrix makes no sense for your dataset (see the model.matrix() function).

0
Entering edit mode

Thanks for the tip. Perhaps I should have declared I'm a novice =).