Microarray experimental design and analysis; repeated measures ANOVA, MaSigPro
1
0
Entering edit mode
3.5 years ago
tleona3 ▴ 10

Hi All,

I have a large RNA microarray dataset that I am planning to analyze, but am having difficulty deciding what statistical tests to perform or R packages to used based on the experimental design. The design is as follows:

There are 21 patients who have had tissue biopsies performed in two different locations (C or M) over three time points depending on the schedule they were assigned to. They all had initial biopsies in a specific location, then repeat biopsies were taken in a small area of the same location at two later time points to see how the tissues changed over time.

Schedule 1: 0h, 6h, 24h Schedule 2: 0h, 24h, 48h Schedule 3: 0h, 48h 72h

Patients 1-7 were assigned to schedule 1, Pts. 8-14 assigned to schedule 2, and Pts. 15-21 assigned to schedule 3.

Ex. Design Image

My question is what statistical analysis would help me answer the following questions:

1) What patterns of gene expression changes are seen in tissue C over time with respect to time 0hr? - I want to find genes that may not be highly expressed until after time 0hr, genes that are high in time 0hr and go down over time, or genes that are low in time 0hr increase during the middle timepoints and fall back down by the last timepoint.

2) Same as question 1 but for tissue M

3) Which genes expression patterns over time are similar to both tissues or unique to each tissue?

I was planning to use MaSigPro, but I read in the original article that it makes an assumption that observations are independent and doesn't account for repeated measures. Are there any suggestions for what R packages or statistical tests will help with analyzing this dataset?

Cheers!

R microarray statistics MaSigPro ANOVA • 1.7k views
0
Entering edit mode
3.5 years ago

Your design is pretty complex and made more difficult by the fact that your time-points are not the same in each treatment group.

You could attempt to fit a linear model to the data and test each gene independently using either limma or lm(), something like GeneExpression ~ treatment + timepoint + treatment:timepoint. Given the small sample size, though, a linear model may overfit and give unreliable stats.

Two-way, repeated-measures, ANOVA is also an option, with the same formula as above. See here for more info: https://www.r-bloggers.com/two-way-anova-with-repeated-measures/

Finally, you could just break it down into many multiple analyses, doing multiple paired tests like Wilcoxon Signed Rank Test on each gene, and also calculating fold-change differences (linear or log).

My final point is most important: if you have a Professor of Statistics available for a chat, go to her/him just to corroborate! There may be other time-series or forecasting analyses that could be performed.

0
Entering edit mode

Thanks for the helpful response! I agree that the design is quite complex, and unfortunately the patient time points were setup this way due to limitations in the IRB protocol.

It gets a bit worse in that some of the sample time points in each group are missing as they were not run on the microarray due to failing one of the QC checkpoints. I appreciate the references and will definitely find a statistician to corroborate with.

0
Entering edit mode

Yes, I understand. The missing time-points makes it more difficult. I saw a presentation (internal) this year in my lab in Boston where someone had tried to tackle this issue, but the work was 'in progress'.

0
Entering edit mode

You may wish to follow this new thread: Time series comparison with DESEQ2