Why is getting logFC=20 causing a "problem" for you? You would only get such a large logFC from edgeR if the counts are zero in one group and in the hundreds of thousands in the other. Getting a large logFC seems a fair representation in such cases.

Anyway, you can shrink the log-fold-changes by as much or as little as you want in edgeR. Just use a larger value of `prior.count`

when you run `glmFit`

(or `glmQLFit`

). Or you can use the `predFC`

function, which will give the same result as `glmFit`

.

By default, the model fitting functions in edgeR are set to do light shrinkage of the logFCs by setting `prior.count = 0.125`

. If you increase `prior.count`

to 5, that will do quite heavy shrinkage. Note that shrinkage is not the same as simply multiplying the logFC by a scaling factor. Increasing `prior.count`

has much more effect on the logFC for low count genes than on logFCs for highly expressed genes because the logFC for low count genes are less certain. Changing `prior.count`

only affects the estimated logFCs. All the other output including test statistics and p-values remain the same.

The reason why the argument is called `prior.count`

is because the shrinkage corresponds to a Bayesian model where the size of the counts in a prior dataset encapsulates the strength of the prior belief that the logFCs should be close to zero. Long ago (15 years ago) we toyed around with the idea of estimating an optimal value for the prior.count automatically from the data (it's a chapter in Belinda Phipson's thesis) but eventually decided it would be safer and better to use preset values in the public edgeR package.

To illustrate the use of `prior.count`

, here is a toy example where edgeR does return logFC=20:

```
> counts <- matrix(c(0,1000,135000,2010),2,2)
> colnames(counts) <- c("A","B")
> rownames(counts) <- c("Gene1","Gene2")
> counts
A B
Gene1 0 135000
Gene2 1000 2010
> group <- c("A","B")
> design <- model.matrix(~group)
> fit <- glmFit(counts,design,dispersion=0,lib.size=rep(10e6,2))
> lrt <- glmLRT(fit)
> topTags(lrt)
Coefficient: groupB
logFC logCPM LR PValue FDR
Gene1 20.0426 12.7207 187149.74 0.0000e+00 0.0000e+00
Gene2 1.0071 7.2355 345.57 3.9103e-77 3.9103e-77
```

Changing the default setting to `prior.count=5`

shrinks the first logFC considerably while leaving the second almost unaffected:

```
> fit <- glmFit(counts,design,dispersion=0,prior.count=5,lib.size=rep(10e6,2))
> lrt <- glmLRT(fit)
> topTags(lrt)
Coefficient: groupB
logFC logCPM LR PValue FDR
Gene1 14.7207 12.7207 187149.74 0.0000e+00 0.0000e+00
Gene2 1.0036 7.2355 345.57 3.9103e-77 3.9103e-77
```

Please check out the post here in Bioconductor. It has been very nicely explained what you could do or should not do.

Thank you for the reply

Thank you for your reply.