Question: Pairwise comparision in DESeq2
0
gravatar for EVR
2.5 years ago by
EVR540
Earth
EVR540 wrote:

HI,

I have a RNA seq data performed at different time points. So for every time I have 4 samples(Control, Knock-down_1, Knock-down_2, Knockdown_3) and I want to compare every Knock-down samples to its Control samples. As DESeq2 two set of samples to predict the Diff. expressed genes, how the analysis can be carried out:

 a) Includes all samples from this particular time point and later use contrasts function to find the Diff expressed genes between specific samples
                                   OR
 b) Include only the two samples for which you want to find the Diff expressed genes and finish the analysis.

Thanks in advance

rna-seq deseq2 • 937 views
ADD COMMENTlink modified 2.5 years ago by Carlo Yague4.6k • written 2.5 years ago by EVR540
2
gravatar for Carlo Yague
2.5 years ago by
Carlo Yague4.6k
Belgium
Carlo Yague4.6k wrote:

The correct answer is "c) Includes all sample from all time points" because it will give you the best gene-level dispersion estimate.

There is a great tutorial here that explain how to do time-course analysis with mutants with DESeq2.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Carlo Yague4.6k

Thanks for your comment. But I am not comparing the samples of one time point with another time point but samples within the time point so why to include samples of other time points. Wont it influence values of the other samples. For an example, is it worth having the counts of samples from day7 influencing the counts of samples in day1?

ADD REPLYlink written 2.5 years ago by EVR540
1

Like Carlo said, "..because it will give you the best gene-level dispersion estimate". Fit the model using all of your data and it will give you a better estimate of the mean/variance-trend for any given gene. With this estimate, you can better estimate differences between your experimental arms at any given timepoint than if you were analysing just the samples from that timepoint.

ADD REPLYlink written 2.5 years ago by russhh4.6k

Thank you russhh. I can understand to get a better gene-level estimate, it is better to use all samples from all time points. But I still cant understand.For an example, wont the actual real expression(raw counts) of gene x at 3 hours gets affected by its actual expression(raw counts) of same gene x at day7?

ADD REPLYlink written 2.5 years ago by EVR540
1

The expression will be unaffected if you take all time points, but you will be more accurate when assessing the significance of a difference in expression.

That is, as long as your model take into account the time and the interaction between the time and the strain . If you don't consider the time in the model, then your time points will be seen as replicates and the "expression" would be affected.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Carlo Yague4.6k
1

Admittedly the counts at day3 and day7 will be statistically/biologically dependent. But, how to account for that dependence is not the question that you originally posed. Data from any quantitative experiment can be viewed as comprising signal and noise. You'd hope that although there may be dependence between the fitted values for your different samples, the noise should be uncorrelated between those samples. And it's your ability to estimate the amount of noise that is improved when you include all of your different timepoints.

ADD REPLYlink written 2.5 years ago by russhh4.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 862 users visited in the last hour