Question

How to perform DESeq2 based on paired sample test?

0

Entering edit mode

6.8 years ago

Xiaokang ▴ 70

I'm using DESeq2 to do differentially expression analysis. I have 8 liver slices from control group, and 6 liver slices from treatment group. So the purpose is to find out the differentially expressed genes between these two groups. But those slices are not independent: slices from control group the slices from treatment group may come from the same fish.

I'm using DESeq2 package in R, so my colData looks like:

    samples fishGroup expGroup
1   Sample1         1  control
2  Sample12         3  control
3  Sample18         4  control
4  Sample25         5  control
5  Sample31         6  control
6  Sample38         7  control
7  Sample44         8  control
8   Sample6         2  control
9  Sample11         2    treat
10 Sample17         3    treat
11 Sample24         4    treat
12 Sample30         5    treat
13 Sample37         6    treat
14 Sample50         8    treat

If I perform DESeq2 based on independent sample test, then only "expGroup" will be used:

dds_independent <- DESeqDataSetFromMatrix(countData, colData, design = ~expGroup)

But that will miss the samples' pair information. So I guess that the paired sample test is better.

In the section of "Model matrix not full rank" from vignette('DESeq2'), it seems to discuss about this problem, to use a design like this: design = ~ expGroup + fishGroup. I'm no expert on statistics, but it seems that the two factors are treated equally, but in my understanding, expGroup should be first, and fishGroup only gives the pair information.

By the way, design = ~ expGroup + fishGroup + expGroup:fishGroup (which I don't know the exact meaning) throws out an error:

Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified. 
  Levels or combinations of levels without any samples have resulted in column(s) of zeros in the model matrix.

So anyone has any idea of how to design the "design" to tell DESeq2 to do a paired sample test?

RNA-Seq DESeq2 • 6.9k views

ADD COMMENT • link updated 3.5 years ago by Gang Xue • 0 • written 6.8 years ago by Xiaokang ▴ 70

0

Entering edit mode

It is problematic since you don't have matching for all fish - have you tried running it for just the matched samples. You could use edgeR where you can precompute a design matrix and check it before running your diffex analysis. Or you could use voom, with a 'block' over your fishGroup (not sure how that behaves with singletons in the block though).

ADD REPLY • link 6.8 years ago by russhh 5.7k

0

Entering edit mode

Still want to stick to DESeq2. Tried deleting the non-matching samples, but got strange results. Refer to my reply to Devon Ryan

ADD REPLY • link 6.8 years ago by Xiaokang ▴ 70

score 0 · Answer 1 · 2017-06-21

0

Entering edit mode

6.8 years ago

Devon Ryan 104k

Drop samples 1 and 38, they're doing nothing for you and only causing issues. You can then fit with ~fishGroup+expGroup. The order is only convenient for setting what's plotted and returned by default in DESeq2, otherwise it's exactly the same as ~expGroup+fishGroup.

Don't include an interaction in your model. Even if you could fit it (it's impossible) it wouldn't be useful.

ADD COMMENT • link 6.8 years ago by Devon Ryan 104k

0

Entering edit mode

I deleted the two non-matching samples from control group. And ran both ( design = ~expGroup and design = ~expGroup + fishGroup) again, but But the result is quite abnormal: I got 10 differentially expressed (DE) genes (padj < 0.05) from the first design, but 2412 DE genes from the second design. I've looked into the DE genes from the second design, and they're not differentially expressed, since you'll find that the slice sample pairs (from the same fish) give you different conclusion about whether the gene is up regulated or down regulated. Take one gene which is regarded as DE gene from the second design:

fishGroup   expGroup  count
2  control  2
2  treat  10
3  control  87
3  treat  56
4  control  10
4  treat  64
5  control  0
5  treat  2
6  control  59
6  treat  24
8  control  346
8  treat  55

ADD REPLY • link 6.8 years ago by Xiaokang ▴ 70

0

Entering edit mode

Looking at raw counts is a waste of time, look at normalized counts.

ADD REPLY • link 6.8 years ago by Devon Ryan 104k

0

Entering edit mode

Thank you for your remind. That's definitely essential. But after I normalize all the counts by dividing the sample library size (the sum of all counts for one sample), I got this:

    fishGroup   expGroup  count
    2  control  5.1
    2  treat  25.6
    3  control  222.9
    3  treat  143.4
    4  control  25.6
    4  treat  164.0
    5  control  0
    5  treat  5.1
    6  control  151.1
    6  treat  61.5
    8  control 886.3
    8  treat  141

That's still confusing: can't tell this is a differentially regulated gene.

ADD REPLY • link 6.8 years ago by Xiaokang ▴ 70

0

Entering edit mode

You're using DESeq2, so counts(dds, normalized=T) will give you more useful values.

ADD REPLY • link 6.8 years ago by Devon Ryan 104k

0

Entering edit mode

did you relevel your fishGroup factor after dropping levels?

ADD REPLY • link 6.8 years ago by russhh 5.7k

0

Entering edit mode

Sorry, but what do you mean of "relevel"? This is the new colData after deleting the non-matching sample anyway:

   samples fishGroup expGroup
1  Sample12         3  control
2  Sample18         4  control
3  Sample25         5  control
4  Sample31         6  control
5  Sample44         8  control
6   Sample6         2  control
7  Sample11         2    treat
8  Sample17         3    treat
9  Sample24         4    treat
10 Sample30         5    treat
11 Sample37         6    treat
12 Sample50         8    treat

ADD REPLY • link 6.8 years ago by Xiaokang ▴ 70

0

Entering edit mode

You've dropped the rows for fishGroup 1 and 7, but are these groups still present in levels(colData$fishGroup)?

ADD REPLY • link 6.8 years ago by russhh 5.7k

0

Entering edit mode

I've deleted those two samples completely, including the levels 1 and 7 in fishGroup:

> levels(colData$fishGroup)
[1] "2" "3" "4" "5" "6" "8"

ADD REPLY • link 6.8 years ago by Xiaokang ▴ 70

0

Entering edit mode

Hi, Devon, I would like to known whether it is ok to run Deseq2 in the design of ~fishGroup+expGroup if some samples have duplicates in fishGroup, and how does Deseq2 solve this ? for example: fishGroup 8 have one sample in control, while possess two samples in treat group.

samples fishGroup expGroup
1  Sample12         3  control
2  Sample18         4  control
3  Sample25         5  control
4  Sample31         6  control
**5  Sample44         8  control**
6   Sample6         2  control
7  Sample11         2    treat
8  Sample17         3    treat
9  Sample24         4    treat
10 Sample30         5    treat
11 Sample37         6    treat
**12 Sample50         8    treat**
**13 Sample51         8    treat**

ADD REPLY • link 3.5 years ago by Gang Xue • 0

0

Entering edit mode

Gang Xue, did you found the answer to your question? I have the same problem and I am not sure what to do.

ADD REPLY • link 3.4 years ago by kpawel2210 • 0