Deseq2 error using HTSeq input for 10 samples with 3 replicates each
7.8 years ago
stan ▴ 80

Hi all,

I am getting the following error when I run the differential analysis step: dds<-DESeq(ddsHTSeq)

Error in validObject(object) :
  invalid class "DESeqDataSet" object: factors in design formula must have samples for each level.
  this error can arise when subsetting a DESeqDataSet, in which
  all the samples for one or more levels of a factor in the design were removed.
  if this was intentional, use droplevels() to remove these levels, e.g.:

  dds$condition <- droplevels(dds$condition)

I have 30 files each containing counts for each replicate, and this is what I have done so far:

>sampleFiles <- grep("counts",list.files(directory),value=TRUE)
>sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition)
> sampleTable
        sampleName        fileName     condition
1   B0_counts1.txt  B0_counts1.txt    BP1_0hrs_1
2   B0_counts2.txt  B0_counts2.txt    BP1_0hrs_2
3   B0_counts3.txt  B0_counts3.txt    BP1_0hrs_3
4  B12_counts1.txt B12_counts1.txt   BP1_12hrs_1
5  B12_counts2.txt B12_counts2.txt   BP1_12hrs_2
6  B12_counts3.txt B12_counts3.txt   BP1_12hrs_3
7  B24_counts1.txt B24_counts1.txt   BP1_24hrs_1
8  B24_counts2.txt B24_counts2.txt   BP1_24hrs_2
9  B24_counts3.txt B24_counts3.txt   BP1_24hrs_3
10  B6_counts1.txt  B6_counts1.txt    BP1_6hrs_1
11  B6_counts2.txt  B6_counts2.txt    BP1_6hrs_2
12  B6_counts3.txt  B6_counts3.txt    BP1_6hrs_3
13 B72_counts1.txt B72_counts1.txt   BP1_72hrs_1
14 B72_counts2.txt B72_counts2.txt   BP1_72hrs_2
15 B72_counts3.txt B72_counts3.txt   BP1_72hrs_3
16  V0_counts1.txt  V0_counts1.txt  Valor_0hrs_1
17  V0_counts2.txt  V0_counts2.txt  Valor_0hrs_2
18  V0_counts3.txt  V0_counts3.txt  Valor_0hrs_3
19 V12_counts1.txt V12_counts1.txt Valor_12hrs_1
20 V12_counts2.txt V12_counts2.txt Valor_12hrs_2
21 V12_counts3.txt V12_counts3.txt Valor_12hrs_3
22 V24_counts1.txt V24_counts1.txt Valor_24hrs_1
23 V24_counts2.txt V24_counts2.txt Valor_24hrs_2
24 V24_counts3.txt V24_counts3.txt Valor_24hrs_3
25  V6_counts1.txt  V6_counts1.txt  Valor_6hrs_1
26  V6_counts2.txt  V6_counts2.txt  Valor_6hrs_2
27  V6_counts3.txt  V6_counts3.txt  Valor_6hrs_3
28 V72_counts1.txt V72_counts1.txt Valor_72hrs_1
29 V72_counts2.txt V72_counts2.txt Valor_72hrs_2
30 V72_counts3.txt V72_counts3.txt Valor_72hrs_3

So the error seems to be coming from the way I am assigning levels to colData, and I am not sure how to do it exactly. I have 2 broad conditions (valor vs BP1) and 5 time-points for each

> colData(ddsHTSeq)$condition <- factor(
    levels = c("Valor_0hrs","Valor_6hrs",


7.8 years ago
Martombo ★ 3.0k

The levels you're providing don't really match the values you gave to "condition". BP1_0hrs_1 is different from BP1_0hrs. Just give the same value to the three replicates, e.g.: "BP1_0hrs", "BP1_0hrs", "BP1_0hrs". or rep("BP1_0hrs", 3).

If you want to keep the replicate number to check for things like batch effect for example, you can provide a new vector to the data.frame with only the replicate number.

That's it. The specified test condition "Valor_12hrs" isn't exactly any of the values in the sampleTable condition column.

Thanks guys, that fixed it!!


