Question

DESeq2 dataset setting with count table.

0

Entering edit mode

10.0 years ago

jhkim1972 • 0

Hi,

I am a novice for R and bioinformatics. I am trying to run DESeq2 with my raw count table (.txt). I have been trying to follow the beginner's guide for the DESeq2 package, but it is still hard to understand because my experimental condition is different from the example. Would you please help me to figure it out how to make a DESeqDataset and run? Thank you in advance for your time.

Here is my experimental design.

Treatment: Control vs Treated
Animal group: 4 genotypes in duplicate (genotype1-1, genotype1-2, genotype2-1, genotype2-2, genotype3-1, genotype3-2, genotype4-1, genotype4-2)
Model: Treatment, Animal group, Animal * Treatment (interaction of animal and treatment).

I want to run DESeq2 with 2-way ANOVA-like frame work (2 treatment X 4 genotypes).

Jin

RNA-Seq R • 18k views

ADD COMMENT • link updated 2.7 years ago by Ram 45k • written 10.0 years ago by jhkim1972 • 0

0

Entering edit mode

Which animal groups are treated and untreated?

ADD REPLY • link 10.0 years ago by poisonAlien ★ 3.2k

0

Entering edit mode

Each treatment has 4 genotypes.

In other words, Control (genotype1-1, genotype1-2, genotype2-1, genotype2-2, genotype3-1, genotype3-2, genotype4-1, genotype4-2).

Treated (genotype1-1, genotype1-2, genotype2-1, genotype2-2, genotype3-1, genotype3-2, genotype4-1, genotype4-2).

ADD REPLY • link updated 2.7 years ago by Ram 45k • written 10.0 years ago by jhkim1972 • 0

1

Entering edit mode

So if I understand correctly, you have total 16 samples. 8 treated and 8 control, each with 4 genotypes in duplicates.

#form coldata
coldata = data.frame(row.names = c('genotype1-1', 'genotype1-2', 'genotype2-1', 'genotype2-2', 'genotype3-1', 'genotype3-2', 'genotype4-1', 'genotype4-2','genotype1-1_treated', 'genotype1-2_treated', 'genotype2-1_treated', 'genotype2-2_treated', 'genotype3-1_treated', 'genotype3-2_treated', 'genotype4-1_treated', 'genotype4-2_treated'),group = rep(c("gt1","gt2","gt3","gt4"),2,each = 2),treatment = rep(c("control","treated"),each = 8))

#make factors for treatment column.

# coldata$treatment = factor(x = coldata$treatment,levels = c('treated','control')) #NOT THIS
coldata$treatment = factor(x = coldata$treatment,levels = c('control','treated')) #BUT THIS

coldata
group treatment
genotype1-1           gt1   control
genotype1-2           gt1   control
genotype2-1           gt2   control
genotype2-2           gt2   control
genotype3-1           gt3   control
genotype3-2           gt3   control
genotype4-1           gt4   control
genotype4-2           gt4   control
genotype1-1_treated   gt1   treated
genotype1-2_treated   gt1   treated
genotype2-1_treated   gt2   treated
genotype2-2_treated   gt2   treated
genotype3-1_treated   gt3   treated
genotype3-2_treated   gt3   treated
genotype4-1_treated   gt4   treated
genotype4-2_treated   gt4   treated

#construct DESeq object from input matrix (countdata)
dds = DESeqDataSetFromMatrix(countData = countMatrix, colData = coldata, design = ~ group + treatment + group:treatment)

dds = DESeq(dds)
res = results(dds)

But again if you post this on bioconductor support forum you will get proper suggestion from developer himself.

ADD REPLY • link updated 2.7 years ago by Ram 45k • written 10.0 years ago by poisonAlien ★ 3.2k

0

Entering edit mode

Hi poisonAlien,

Thank you very much for your help. I will try this and post on the forum as your suggestion as well.

Jin

ADD REPLY • link 10.0 years ago by jhkim1972 • 0

0

Entering edit mode

I tried it, and got a following Error like below.

> dds = DESeqDataSetFromMatrix(countData = countMatrix, colData = coldata, design = ~ group + treatment + group:treatment)
Error in as.matrix(countData) :
  error in evaluating the argument 'x' in selecting a method for function 'as.matrix': Error: object 'countMatrix' not found
Calls: DESeqDataSetFromMatrix -> as.matrix

ADD REPLY • link updated 2.7 years ago by Ram 45k • written 10.0 years ago by jhkim1972 • 0

1

Entering edit mode

countMatrix is your counts table. (as you mentioned in your question - raw count table (.txt).) You should read it into R before doing this.

Assuming you have named your table as 'count_table.txt' and first column is your gene names (or IDs), followed by rest of the 16 columns, organised according to sample names in above 'coldata'

countMatrix = read.delim("count_table.txt",sep='\t',row.names = 1)

then try running above code.

ADD REPLY • link updated 2.7 years ago by Ram 45k • written 10.0 years ago by poisonAlien ★ 3.2k

0

Entering edit mode

There was no error, but give some message below,

> dds = DESeqDataSetFromMatrix(countData = countMatrix, colData = coldata, design = ~ group + treatment + group:treatment)

It appears that the last variable in the design formula, treatment, has a factor level, control, which is not the base level. We recommend you use factor(...,levels=...) or relevel() to set this as the base level before proceeding. for more information, please see the 'Note on factor levels' in vignette('DESeq2').

ADD REPLY • link updated 2.7 years ago by Ram 45k • written 10.0 years ago by jhkim1972 • 0

1

Entering edit mode

Ahh! The warning because reference level is treated here. See the edit above.

ADD REPLY • link updated 2.7 years ago by Ram 45k • written 10.0 years ago by poisonAlien ★ 3.2k

0

Entering edit mode

It works! Thank you so much for your help!

ADD REPLY • link updated 2.7 years ago by Ram 45k • written 10.0 years ago by jhkim1972 • 0

0

Entering edit mode

I have got the result below,

Does this mean that the genotype4_treated is significantly different from others?

log2 fold change (MAP): groupgt4.treatmenttreated
Wald test p-value: groupgt4.treatmenttreated
DataFrame with 39595 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
unigene22398972 189645.38 -0.1773434 0.2744156 -0.6462584 0.518111997 0.999899
unigene22407675 179211.24 0.1429904 0.1045016 1.3683091 0.171215344 0.999899
unigene22402019 72765.83 -0.1883720 0.2830502 -0.6655075 0.505725927 0.999899
unigene22387256 70438.23 0.3585405 0.1246933 2.8753784 0.004035435 0.999899
unigene22410725 29215.47 0.2276120 0.1606212 1.4170736 0.156461409 0.999899
... ... ... ... ... ... ...
unigene22407834 0.07103343 -5.161589e-05 0.01173293 -0.004399233 0.9964899 NA
unigene22405341 0.07103343 -5.161589e-05 0.01173293 -0.004399233 0.9964899 NA
unigene22407238 0.07103343 -5.161589e-05 0.01173293 -0.004399233 0.9964899 NA
unigene22407247 0.07103343 -5.161589e-05 0.01173293 -0.004399233 0.9964899 NA
unigene22397119 0.07103343 -5.161589e-05 0.01173293 -0.004399233 0.9964899 NA

ADD REPLY • link updated 2.7 years ago by Ram 45k • written 10.0 years ago by jhkim1972 • 0

0

Entering edit mode

Sorry again, I am in trouble with an Error. Below is all scripts.

> library(DESeq2)
> library("DESeq2")
> countMatrix <- read.table ("~/Desktop/KiPBSvsPIC.txt", header=TRUE)
> head (countMatrix)
               ID Genotype1.1 Genotype1.2 Genotype2.1 Genotype2.2 Genotype3.1
1 unigene22398972      223736       97095      182659      161499      288360
2 unigene22407675      197279      218530      241506      194318      189604
3 unigene22402019      102051       50008       68617       60005       98462
4 unigene22387256       82289       93196       88090       69212       67281
5 unigene22410725       26403       51824       35820       36544       21938
6 unigene22400817       45626       20860       28781       24550       43365
  Genotype3.2 Genotype4.1 Genotype4.2 Genotype1.1_treated Genotype1.2_treated
1      280347      160611      239753              162563               51277
2      158650      184416      173256              173624              185385
3      138085       39193       79512               68575               28035
4       55019       62363       71076               54943               58875
5       22089       33376       21841               31780               27587
6       55484       15369       30763               26308               11168
  Genotype2.1_treated Genotype2.2_treated Genotype3.1_treated
1              286391               92844              243987
2              161664              154975              187180
3               93280               30827               88396
4               58179               54870               82084
5               32564               25289               27143
6               40633               13106               34436
  Genotype3.2_treated Genotype4.1_treated Genotype4.2_treated
1              314833              120738              105184
2              149823              173155              163640
3              136919               33087               27698
4               55364               99173               80765
5               18275               33866               26191
6               54273               13731               10517
> #form coldata
> coldata = data.frame(row.names = c('genotype1-1', 'genotype1-2', 'genotype2-1', 'genotype2-2', 'genotype3-1', 'genotype3-2', 'genotype4-1', 'genotype4-2','genotype1-1_treated', 'genotype1-2_treated', 'genotype2-1_treated', 'genotype2-2_treated', 'genotype3-1_treated', 'genotype3-2_treated', 'genotype4-1_treated', 'genotype4-2_treated'),group = rep(c("gt1","gt2","gt3","gt4"),2,each = 2),treatment = rep(c("control","treated"),each = 8))
> #make factors for treatment column.
> coldata$treatment = factor(x = coldata$treatment,levels = c('control','treated'))
> coldata
                    group treatment
genotype1-1           gt1   control
genotype1-2           gt1   control
genotype2-1           gt2   control
genotype2-2           gt2   control
genotype3-1           gt3   control
genotype3-2           gt3   control
genotype4-1           gt4   control
genotype4-2           gt4   control
genotype1-1_treated   gt1   treated
genotype1-2_treated   gt1   treated
genotype2-1_treated   gt2   treated
genotype2-2_treated   gt2   treated
genotype3-1_treated   gt3   treated
genotype3-2_treated   gt3   treated
genotype4-1_treated   gt4   treated
genotype4-2_treated   gt4   treated
> #construct DESeq object from input matrix (countdata)
> dds = DESeqDataSetFromMatrix(countData = countMatrix, colData = coldata, design = ~ group + treatment + group:treatment)
Error in validObject(.Object) :
  invalid class "SummarizedExperiment" object: 'colData' nrow differs from 'assays' ncol
Calls: DESeqDataSetFromMatrix ... .local -> new -> initialize -> initialize -> validObject
In addition: Warning message:
In sort(rownames(colData)) == sort(colnames(countData)) :
  longer object length is not a multiple of shorter object length
Error in validObject(.Object) :
  invalid class "SummarizedExperiment" object: 'colData' nrow differs from 'assays' ncol
Calls: DESeqDataSetFromMatrix ... .local -> new -> initialize -> initialize -> validObject

ADD REPLY • link updated 2.7 years ago by Ram 45k • written 9.9 years ago by jhkim1972 • 0

0

Entering edit mode

I can't say much about your results. But regarding the above error, your first column must start with counts. Here your first column is ID's, which should be rownames. While reading the table, use row.names = 1, which makes first columns as rowanmes. Anyways here is it.

library("DESeq2")

countMatrix = read.table("~/Desktop/KiPBSvsPIC.txt",header = T,row.names = 1)

coldata = data.frame(row.names = colnames(countMatrix),group = rep(c("gt1","gt2","gt3","gt4"),2,each = 2),treatment = rep(c("control","treated"),each = 8))

coldata$treatment = factor(x = coldata$treatment,levels = c('control','treated'))

dds = DESeqDataSetFromMatrix(countData = countMatrix, colData = coldata, design = ~ group + treatment + group:treatment)

dds = DESeq(dds)
res = results(dds)

#order results according adjusted p-value
res = res[order(res$padj),]

#get degs based on fdr < 0.1
degs = res[which(res$padj < 0.1),]

ADD REPLY • link updated 2.7 years ago by Ram 45k • written 9.9 years ago by poisonAlien ★ 3.2k

0

Entering edit mode

Thank you so much for your help again. I appreciate it.

ADD REPLY • link 9.9 years ago by jhkim1972 • 0

0

Entering edit mode

Hi, I am just bit confused.. I run the following script on R version 3.2 and everything was fine:

conds<- c("kd1.1","kd1.2","kd2.1","kd2.2","ctrl1.1","ctrl1.2")
Design<-data.frame(condition=conds,row.names=paste(conds,rep(c(1,2),3),sep="_"))
dds <- DESeqDataSetFromMatrix(countData = assay(se),
                             colData = Design,
                             design = ~ condition)

however, when I updated R into 3.3 I start getting the following error message:

Error in FUN(X[[i]], ...) : 
  assay colnames() must be NULL or equal colData rownames()

I tried to follow many solutions but non of them worked .

can anyone help

thanks much

ADD REPLY • link 8.8 years ago by ta_awwad ▴ 350

score 1 · Accepted Answer · 2017-02-27

Late, but the error is reported in the manual of the new version. It seems a known issue after the update to 3.3: https://www.bioconductor.org/packages/devel/bioc/manuals/DESeq2/man/DESeq2.pdf "Note on the error message "assay colnames() must be NULL or equal colData rownames()": this means that the colnames of countData are different than the rownames of colData. Fix this with: colnames(countData) <- NULL"