Deseq2 , Limma - the paired test seems to be incorrect - is it known problem ?
2
0
Entering edit mode
6 months ago
Alexander ▴ 220

The paired experiments produce the following results. Which seems clearly indicates the differential expression, however neither Deseq2, nor Limma does not give that gene is differentially expressed. (Returns p-value about 0.5).

Is it something like a known problem ? Or what can be the reason for that ?

enter image description here

Code:

deseq2:
cmp <- DESeqDataSetFromMatrix(countData = as.matrix(counts),
                                    colData = coldata,
                                    design= ~ rep + cond)

limma:
cmp <- model.matrix(~0 + cond + rep)

enter image description here

PS

The example shown - is for one gene - of course the measurements are for all genes - just one is shown.

We see such kind of problem systematically - above is just one illustrative example - but similar things happen a lot.

By "paired" experiment - we mean similar to "paired" t-test https://en.wikipedia.org/wiki/Paired_difference_test That means experiments which stand in the same column (for condition 1 and condition 2) are made simultaneously in the same environment. While row-axis - a bit of change of environment - another day. That is why we looking for "paired" comparison. Experiments on the same column - paired.

deseq • 1.1k views
ADD COMMENT
0
Entering edit mode

What is 'the paired experiments'? Are you running only a single gene? Are these counts (or whatever this is) normalized. More details are needed.

ADD REPLY
0
Entering edit mode

All genes. Just that an example where we have problem. Paired - as usual - in experiment two samples condition 1 and codition 2, and repeat 4 times.

ADD REPLY
0
Entering edit mode

You're still being quite unclear. Please differentiate paired experiments vs whatever other experiment you're running.

ADD REPLY
0
Entering edit mode

Pairs are in columns

ADD REPLY
0
Entering edit mode

Please be clearer - What are the paired experiments, and how do they "clearly indicate" differential expression? What are DEseq2 and limma missing in your opinion? Please understand that you're giving barely relevant information piecemeal, which is a really really bad way to seek help.

ADD REPLY
0
Entering edit mode

Maybe something is getting lost in translation, but where are you getting the "paired experiment" results from?

ADD REPLY
1
Entering edit mode

I believe the OP refers to a "paired" design, as in a "paired t-test", otherwise known as a repeated measures design. See section 9.4.1 of the limma manual.

ADD REPLY
0
Entering edit mode
6 months ago
bhumm ▴ 170

So I am not terribly experienced with DeSeq2/limma so other answers will be helpful as well.

I think the results are valid. I think the issue is the normalization and scaling process of DeSeq2. See this tutorial for more information on said process.

Additionally, looking at the screenshot you provided, you have substantial variability across your biological replicates. Assuming DeSeq2 takes this into account, I am not terribly surprised you did not see any differential expression in the example you provided.

ADD COMMENT
0
Entering edit mode
6 months ago

I think your error here is being so confident that the above results definitely indicate a differential effect. Just looking at the numbers you've posted above, you've a mean difference of 7 and a standard deviation of 3.75. Now this isn't particularly a valid way to do things because the fold changes aren't normally distributed, but its an interesting way to get a preliminary idea.

Of course DESeq and Limma won't use the raw input numbers for doing the test. Firstly, the numbers are normalized. Then the measure of spread of the data is not just the standard deviation (they way its calculated is different between DESeq and Limma, but just know that its more complex than just the standard deviation). Both have ways of dealing with 0s in the data. Finally, DESeq will exclude outliers.

Also, we are the numbers you've got here not integers? How did you estimate them?

ADD COMMENT
0
Entering edit mode

Thanks for the reply ! Okay, assume I am not correct - but what do you expect there would be correct ration - to say yes to differntial expression - when row1 is 10 times greater than row2 , 100 times, 10000 times ? Row1 is significantly greater than row2 for ALL(!) 4 observations - imagine null hypothesis is true - no difference between row1 and row2 - what is the chance for what we see - 1/16 (or 1/8 - depends) it is already 0.0625 (0.125) . While looking on the ratio gives clearly more evidence that something is wrong with Deseq/limma

ADD REPLY
0
Entering edit mode

Seconding what Ian says, there is not terribly much evidence that there is a differential expression going on. Depending on how many genes you test and how they influence the results it can happen that this here does not out significant at that (low) sample size. See below limma and paired t-tests (the latter for both "raw" and log2-transformed values).

library(limma)

condA <- c(25.6, 532.6, 982.4, 11693.2)
condB <- c(0, 194.9, 114.7, 1200.3)

data_raw <- matrix(c(condA, condB), byrow = TRUE, nrow=1)
data_log2 <- log2(data_raw + 1)

group <- factor(rep(c("A", "B"), each=4), levels = c("B", "A"))
pairs <- factor(rep(c("1", "2", "3", "4"), 2))
m <- model.matrix(~pairs + group)
m
#>   (Intercept) pairs2 pairs3 pairs4 groupA
#> 1           1      0      0      0      1
#> 2           1      1      0      0      1
#> 3           1      0      1      0      1
#> 4           1      0      0      1      1
#> 5           1      0      0      0      0
#> 6           1      1      0      0      0
#> 7           1      0      1      0      0
#> 8           1      0      0      1      0
#> attr(,"assign")
#> [1] 0 1 1 1 2
#> attr(,"contrasts")
#> attr(,"contrasts")$pairs
#> [1] "contr.treatment"
#> 
#> attr(,"contrasts")$group
#> [1] "contr.treatment"

fit <- lmFit(data_log2, design = m)
fit <- eBayes(fit)
topTable(fit, coef = ncol(m))
#>      logFC  AveExpr        t    P.Value  adj.P.Val         B
#> 1 3.137377 7.743339 4.662763 0.01861774 0.01861774 -2.535539

t.test(condA, condB, paired = TRUE)
#> 
#>  Paired t-test
#> 
#> data:  condA and condB
#> t = 1.16, df = 3, p-value = 0.33
#> alternative hypothesis: true mean difference is not equal to 0
#> 95 percent confidence interval:
#>  -5109.878 10971.828
#> sample estimates:
#> mean difference 
#>        2930.975
t.test(log2(condA+1), log2(condB+1), paired = TRUE)
#> 
#>  Paired t-test
#> 
#> data:  log2(condA + 1) and log2(condB + 1)
#> t = 4.6628, df = 3, p-value = 0.01862
#> alternative hypothesis: true mean difference is not equal to 0
#> 95 percent confidence interval:
#>  0.996043 5.278712
#> sample estimates:
#> mean difference 
#>        3.137377
Created on 2024-06-05 with reprex v2.0.2
ADD REPLY
0
Entering edit mode

but what do you expect there would be correct ration - to say yes to differntial expression

the key thing is that it is not the size of the ratio (at least not on its own) that is important, but the variance in ratio (in relation to the mean ratio).

So here, on the simple t.test of log counts, the mean log1p fold change is 3.2, and the standard deviation is 1.3. This means that the mean is approximately 2.3 standard deviations away from 0. This makes it borderline significant.

This is an approximate, back of the envelope calculation. You can see in @ATPoint's post that if you do a paired t.test on log1p values, you get a very similar result to limma when you just do a single gene on its own. Thats because limma uses an empircial bayes shrunk t.test as its statistical test. With only a single gene, it can't do any shrinkage, so its basically just a t.test. When you run this on a complete transcriptome of course, this changes because 1) You will almost always do normalisation, which will change the input values 2) With more genes, it will do empirical bayes shrinkage, which borrows information across genes, and helps with the issue that its very difficult to estimate standard errors with only a small number of samples. This will change standard error estimates. 3) When you do multiple genes, you will have multiple testing correction. If you are doing 20,000 genes, thats an awful lot of multiple testing correction. I wouldn't expect to see adjusted p-values being significant until nominal p-values were orders of magnitude smaller than the 0.01 in the t.test @ATPoint did showed.

ADD REPLY
0
Entering edit mode

Thanks for your point ! But "but the variance in ratio (in relation to the mean ratio)." does it imply if we have say ratios 10000, 100000, 100000000000000000 - very big, but variance also big - that would not count as dif. expr ? That is weird if so.

ADD REPLY
0
Entering edit mode

This is just the way hypothesis testing works.

When we do a hypothesis test, we select a null hypothesis (e.g. that the mean of ratios is 0, or equivalently that the mean of log2FoldChanges is 0). We then test this hypothesis by calculating the probability of observing the data we observed if the null hypothesis were true. In a t.test we assume that the means of a sample of data have an approximately normal distribution (strictly speaking a t-distribution, more on that in a moment). That distribution has a mean of 0 (for if we are dealing with log2foldchanges). But in order to calculate the likelihood of the data, we also need the standard deviation of the distribution. We estimate this from the data sample itself. Now, its not quite as simple as using the standard deviation of the data points, instead, we need to use the standard error, which is the estimated standard deviation of means caculated from samples. The standard error in the case here is 13.8 (on a log2 scale) - is is saying that if you draw n=3 samples repeatedly the means will differ on avearge by 13.8 logs. So now we have our null distribution which is approximately N(0.13.8), but not quite. Because 3 samples is a really small number to estimate the standard error from, we have to use the equivalent t-distribution, rather than the normal distribution (hence the name t.test), which is a bit wider than the normal distribution.

If you run the math, you'll find that the mean of the log2 ratios in your example is 28. Using the distribution we just determined about, we can calculate that the probability of drawing three samples with a mean of 28 or larger if the null hypothesis is true, is about 0.1. So 9 out of ten times, the mean will be smaller. But traditionally we need one in twenty, not one in ten to declare a result significant.

You can also look at the same question using confidence intervals. Given that the mean of our sample is 28, could the mean of the population they were drawn from concievably be 0? The confidence interval for a t-distribution with 2 degrees of freedom is 4.3 times the standard error, so the confidence interval here is 28 +/- 4.3*13.8, or [-30,88]. That is, we can say with 95% confidence that the "real" mean is somewhere between -30 and 88. As that includes 0, we can't confidenctly say that it is inconcievable that the real mean is 0.

So, in answer to your question, under the standard hypothesis testing framework ratios of 10000, 100000, 100000000000000000 would not be called significant. However, ratios of 2.1, 2.2, 1.9 probably would be.

ADD REPLY

Login before adding your answer.

Traffic: 1767 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6