Question

Paired and Multivariate sleuth

4

Entering edit mode

6.7 years ago

Tim D ▴ 70

The sleuth example workflows generally have experimental designs of unpaired data with only 2 experimental conditions.

so <- sleuth_prep(s2c, ~ treatment)
so <- sleuth_fit(so)
so <- sleuth_fit(so, ~1, 'reduced')
so <- sleuth_lrt(so, 'reduced', 'full')

Ensuring that fitted models are aware of paired data seems relatively straightforward, we simply include a biologicalreplicate variable in the model and retain it in the reduced model.

so <- sleuth_prep(s2c, ~ biologicalreplicate + treatment)
so <- sleuth_fit(so)
so <- sleuth_fit(so, ~ biologicalreplicate, 'reduced')
so <- sleuth_lrt(so, 'reduced', 'full')

But what if our treatment column can have three different states (A, B and C), instead of just two? The code stays the same and dummy variables are automatically generated. But the likelihood ratio test now tests which transcripts are affected by any of the two conditions B and C. If we want to examine the effects of a specific condition, we can use the Wald test.

so <- sleuth_prep(s2c, ~ biologicalreplicate + treatment)
so <- sleuth_fit(so)
so <- sleuth_fit(so, ~ biologicalreplicate, 'reduced')
so <- sleuth_wt(so, which_beta = 'treatmentB', 'full')
so <- sleuth_wt(so, which_beta = 'treatmentC', 'full')

If the treatment factor in the s2c object is properly set up, this lets us compare treatment A to B and A to C.

But how do we compare B to C, short of redoing the analysis and reordering the factor to take B as the baseline, instead of A?

In the same vein, is it possible to make these comparisons using the LRT, instead of the Wald test considering the reservations regarding the Wald test expressed by Harold in this Biostars question? The only way I can think of to do this would be to run three different analyses, using restricted datasets that only contain the data relevant to the conditions of interest used in the three different models. We would build a 'full' and 'reduced' model for a dataset consisting of samples treated with A and B, then again for B and C and again for A and C and perform the LRT on each pair of models in turn. But there's no way that can be accurate, surely...

I appreciate any help. If my google-fu has missed workflows where this is explained, I apologize and will gladly settle for a link to those workflows.

R RNA-Seq sleuth kallisto • 4.3k views

ADD COMMENT • link updated 5.9 years ago by samantha_jeschonek ▴ 50 • written 6.7 years ago by Tim D ▴ 70

0

Entering edit mode

Great question, Tim, and something I'm also curious about. Did you ever find a solution?

ADD REPLY • link 5.9 years ago by samantha_jeschonek ▴ 50

2

Entering edit mode

Alas, no.

I tried several things, like explicitly including the intercept in the full model, which in R usually prevents the first factor level from becoming the baseline, but that doesn't seem to work here. In the end, I releveled the factor to get the comparisons that I wanted and just estimated the model several times.

To use the LRT instead of the Wald test, I suppose the single three-condition variable can be made into separate binary variables, which can in tern be omitted in reduced models so that the LRT can test for their relevance in comparison to the complete model. That said, a Wald test and a critical attitude towards the result (or a rank-based approach for enrichment analysis) is probably just as good as a LRT.

ADD REPLY • link 5.9 years ago by Tim D ▴ 70

0

Entering edit mode

I also had the same problem. So, as Tim said re-leveling the factor when using Wald test seems to be the quickest option.

ADD REPLY • link 5.5 years ago by aln ▴ 320