Question: Deseq2 error using HTSeq input for 10 samples with 3 replicates each
gravatar for stan
4.4 years ago by
Pretoria, South Africa
stan70 wrote:

Hi all,

I am getting the following error when I run the differential analysis step: dds<-DESeq(ddsHTSeq)

Error in validObject(object) :
  invalid class "DESeqDataSet" object: factors in design formula must have samples for each level.
  this error can arise when subsetting a DESeqDataSet, in which
  all the samples for one or more levels of a factor in the design were removed.
  if this was intentional, use droplevels() to remove these levels, e.g.:

  dds$condition <- droplevels(dds$condition)

I have 30 files each containing counts for each replicate, and this is what i have done so far:

>sampleFiles <- grep("counts",list.files(directory),value=TRUE)
>sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition)
> sampleTable
        sampleName        fileName     condition
1   B0_counts1.txt  B0_counts1.txt    BP1_0hrs_1
2   B0_counts2.txt  B0_counts2.txt    BP1_0hrs_2
3   B0_counts3.txt  B0_counts3.txt    BP1_0hrs_3
4  B12_counts1.txt B12_counts1.txt   BP1_12hrs_1
5  B12_counts2.txt B12_counts2.txt   BP1_12hrs_2
6  B12_counts3.txt B12_counts3.txt   BP1_12hrs_3
7  B24_counts1.txt B24_counts1.txt   BP1_24hrs_1
8  B24_counts2.txt B24_counts2.txt   BP1_24hrs_2
9  B24_counts3.txt B24_counts3.txt   BP1_24hrs_3
10  B6_counts1.txt  B6_counts1.txt    BP1_6hrs_1
11  B6_counts2.txt  B6_counts2.txt    BP1_6hrs_2
12  B6_counts3.txt  B6_counts3.txt    BP1_6hrs_3
13 B72_counts1.txt B72_counts1.txt   BP1_72hrs_1
14 B72_counts2.txt B72_counts2.txt   BP1_72hrs_2
15 B72_counts3.txt B72_counts3.txt   BP1_72hrs_3
16  V0_counts1.txt  V0_counts1.txt  Valor_0hrs_1
17  V0_counts2.txt  V0_counts2.txt  Valor_0hrs_2
18  V0_counts3.txt  V0_counts3.txt  Valor_0hrs_3
19 V12_counts1.txt V12_counts1.txt Valor_12hrs_1
20 V12_counts2.txt V12_counts2.txt Valor_12hrs_2
21 V12_counts3.txt V12_counts3.txt Valor_12hrs_3
22 V24_counts1.txt V24_counts1.txt Valor_24hrs_1
23 V24_counts2.txt V24_counts2.txt Valor_24hrs_2
24 V24_counts3.txt V24_counts3.txt Valor_24hrs_3
25  V6_counts1.txt  V6_counts1.txt  Valor_6hrs_1
26  V6_counts2.txt  V6_counts2.txt  Valor_6hrs_2
27  V6_counts3.txt  V6_counts3.txt  Valor_6hrs_3
28 V72_counts1.txt V72_counts1.txt Valor_72hrs_1
29 V72_counts2.txt V72_counts2.txt Valor_72hrs_2
30 V72_counts3.txt V72_counts3.txt Valor_72hrs_3

So the error seems to be coming from the way I am assigning levels to colData, and I am not sure how to do it exactly. I have 2 broad conditions (valor vs BP1) and 5 time-points for each

> colData(ddsHTSeq)$condition<-factor(colData(ddsHTSeq)$condition, levels=c("Valor_0hrs","Valor_6hrs","Valor_12hrs","Valor_24hrs","Valor_72hrs","BP1_0hrs","BP1_6hrs","BP1_12hrs","BP1_24hrs","BP1_72hrs"))



rna-seq R • 2.4k views
ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by stan70
gravatar for Martombo
4.4 years ago by
Seville, ES
Martombo2.5k wrote:

the levels you're providing don't really match the values you gave to "condition". BP1_0hrs_1 is different from BP1_0hrs. just give the same value to the three replicates, eg: "BP1_0hrs", "BP1_0hrs", "BP1_0hrs". or rep("BP1_0hrs", 3).

if you want to keep the replicate number to check for things like batch effect for example, you can provide a new vector to the data.frame with only the replicate number.

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by Martombo2.5k

That's it. The specified test condition "Valor_12hrs" isn't exactly any of the values in the sampleTable condition column.

ADD REPLYlink written 4.4 years ago by karl.stamm3.5k

Thanks guys, that fixed it!!

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by stan70
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1965 users visited in the last hour