Question

Complex design with limma R package

0

Entering edit mode

8.4 years ago

Andrea.Wall ▴ 10

Hi everybody,

I'm trying to run a differential expression analysis of RNA-seq data with limma but I'm not sure how to proceed and was hoping to get some suggestions from you.

I have 18 samples from two similar experiments (9 and 9). Each experiment has three treatment groups of which two are the same across the two experiments, and only the control group changed. What I'd like to do is to analyze everything together considering Experiment as a Batch effect and therefore ending up with 4 total treatment groups (two with 6 replicate samples each and two with 3 samples each). Could you suggest how to do this, if it makes sense at all?

Thanks!

Joanito

RNA-Seq • 3.0k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.4 years ago by Andrea.Wall ▴ 10

Ram · Answer 1 · 2015-11-26

Typically you'd account for the variation within the two datasets by using an additive model (~0 + Treatments + Batch), however for that to work, you need to have all the treatment types within both batches.

For example, you have treatments A, B, C, and D, across experiments X and Y. Experiment X has treatments A, B, and C in triplicate, experiment Y has treatments A, B, and D in triplicate. You could normalise everything together and then build a model in limma for this combination, however it'll fail when trying to estimate the variance of D in experiment X, and C in experiment Y. In this example, you'd need to just use samples A and B, normalise them together then use an additive model.

RNA Seq dataset combination becomes even more problematic when you explore machines, library prep, sample prep, etc. What I explained relies on the assumption that the sample preps, machines used, everything was the same, so that the variance you're accounting for in the model is minimal.

Ram · Answer 2 · 2015-11-27

0

Entering edit mode

8.4 years ago

Andrea.Wall ▴ 10

Dear Andrew,

Thanks a lot for your reply. So you suggest to normalize together all samples for which the treatment is the same across the two experiments and then use an additive model (but on all treatments I guess?). Could you expand a bit on this? If you could suggest what codes to use it would be fantastic!

Thanks again!

ADD COMMENT • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by Andrea.Wall ▴ 10

0

Entering edit mode

I'd suggest based on your post that you take the two treatment groups that are the same across the two experiments (note, this assumes that they were prepared in the same way, and ran on the same model of sequencer, with the same chemistry). You can only compare within these two groups across experiments. I'd then suggest that you align these samples using your favourite aligner (Tophat, HISAT2, STAR, etc), then use htSeq_count or rSubRead to get gene level counts. Once you've got counts, you should use DESeq2 which has a fantastically detailed guide, to perform differential gene expression using an additive model with 'experiment' as a term. If you want to do transcript level differential expression, I suggest you go with Salmon or Kallisto to get counts, then feed the output into Sleuth, with the same principle as DESeq2.

Alternatively, to avoid this, you could treat the experiments independently, perform differential expression tests, then look at the intersection between gene lists.

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by andrew.j.skelton73 6.5k

0

Entering edit mode

Dear Andrew,

Thanks for your detailed reply. I have already aligned all my samples using STAR and prepared count matrices with HTseq. I had performed differential expression analyses with DESeq2 separately for each experiment and looked at the intersection of the results. What I did not like of this approach is that since there are two experiments and 3 treatment groups each, I end up with so many pairwise comparisons that are messy to summarize, and also the two experiments for some reason seemed to have different detection power overall so it's complicated to assess the overlap of the same pairwise comparisons across experiments just based on a cutoff of adjusted P values, and this applies also for downstream analyses such as GO enrichment. This is why I was looking into a way to analyze the two experiments in one go (ending up with less comparisons) and see how the results would look like compared to the previous approach. But if I cannot include my two control groups then it's useless (as they are the ones I'm especially interested in comparing the treated samples to). Everything in the experiments is the same (experimental procedures and sequencing approach) so that should not be a problem, but indeed PCA analyses show there is a batch effect of the two experiments. The only thing that differed was the way control samples were treated. What I'm now thinking (if really there is no way of including control groups in an overall analysis) is to merge the controls and treat them as they were the same (conceptually it wouldn't be too far-fetched as really I want to know what in the treatments is different from any of those controls, but I'm not sure if I'd be violating any important assumption). What would you think about that?

Thanks again for your help!

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by Andrea.Wall ▴ 10

0

Entering edit mode

As an exploratory measure, grouping the two different controls should be ok. Beware that doing that means that you may increase the variability of your Control group, meaning that differential expression may perform worse. Ultimately it's up to you to make that judgement, but there's nothing wrong with running it to see what happens.

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by andrew.j.skelton73 6.5k

0

Entering edit mode

Thanks I'll give it a try then!

ADD REPLY • link 8.4 years ago by Andrea.Wall ▴ 10