Question: DESeq2 dataset setting with count table.
0
gravatar for jhkim1972
2.3 years ago by
jhkim19720
Canada
jhkim19720 wrote:

Hi, 

I am a novice for R and bioinfomatics. I am trying to run DESeq2 with my raw count table (.txt). I have been trying to follow the beginner's guide for the DESeq2 package, but it is still hard to understand because my experimental condition is different from the example. Would you please help me to figure it out how to make a DESeqDataset and run? Thank you in advance for your time.

Here is my experiemtal design.

- Treatment: Control vs Treated 

- Animal group:  4 genotypes in duplicate (genotype1-1, genotype1-2, genotype2-1, genotype2-2, genotype3-1, genotype3-2, genotype4-1, genotype4-2)

- Model: Treatment, Animal group, Aniaml * Treatment (interaction of animal and treatment).

I want to run DESeq2 with 2-way ANOVA-like frame work (2 treatment  X 4 genotypes).

 

Jin

rna-seq R • 5.8k views
ADD COMMENTlink written 2.3 years ago by jhkim19720

Which animal groups are treated and untreated?

ADD REPLYlink written 2.3 years ago by poisonAlien2.5k

Each treatment has 4 genotypes.

In other words, Control (genotype1-1, genotype1-2, genotype2-1, genotype2-2, genotype3-1, genotype3-2, genotype4-1, genotype4-2).

Treated (genotype1-1, genotype1-2, genotype2-1, genotype2-2, genotype3-1, genotype3-2, genotype4-1, genotype4-2).

ADD REPLYlink written 2.3 years ago by jhkim19720
1

So if I understand correctly, you have total 16 samples. 8 treated and 8 control, each with 4 genotypes in duplicates.

#form coldata
coldata = data.frame(row.names = c('genotype1-1', 'genotype1-2', 'genotype2-1', 'genotype2-2', 'genotype3-1', 'genotype3-2', 'genotype4-1', 'genotype4-2','genotype1-1_treated', 'genotype1-2_treated', 'genotype2-1_treated', 'genotype2-2_treated', 'genotype3-1_treated', 'genotype3-2_treated', 'genotype4-1_treated', 'genotype4-2_treated'),group = rep(c("gt1","gt2","gt3","gt4"),2,each = 2),treatment = rep(c("control","treated"),each = 8))

#make factors for treatment column.
coldata$treatment = factor(x = coldata$treatment,levels = c('treated','control'))
coldata$treatment = factor(x = coldata$treatment,levels = c('control','treated'))

coldata
group treatment
genotype1-1           gt1   control
genotype1-2           gt1   control
genotype2-1           gt2   control
genotype2-2           gt2   control
genotype3-1           gt3   control
genotype3-2           gt3   control
genotype4-1           gt4   control
genotype4-2           gt4   control
genotype1-1_treated   gt1   treated
genotype1-2_treated   gt1   treated
genotype2-1_treated   gt2   treated
genotype2-2_treated   gt2   treated
genotype3-1_treated   gt3   treated
genotype3-2_treated   gt3   treated
genotype4-1_treated   gt4   treated
genotype4-2_treated   gt4   treated

#construct DESeq object from input matrix (countdata)
dds = DESeqDataSetFromMatrix(countData = countMatrix, colData = coldata, design = ~ group + treatment + group:treatment)

dds = DESeq(dds)
res = results(dds)

But again if you post this on bioconductor support forum you will get proper suggestion from developer himself.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by poisonAlien2.5k

Hi poisonAlien,

Thank you very much for your help. I will try this and post on the forum as your suggestion as well.

Jin

ADD REPLYlink written 2.3 years ago by jhkim19720

I tried it, and got a following Error like below.

> dds = DESeqDataSetFromMatrix(countData = countMatrix, colData = coldata, design = ~ group + treatment + group:treatment)
Error in as.matrix(countData) :
  error in evaluating the argument 'x' in selecting a method for function 'as.matrix': Error: object 'countMatrix' not found
Calls: DESeqDataSetFromMatrix -> as.matrix

ADD REPLYlink written 2.3 years ago by jhkim19720
1

countMatrix is your counts table. (as you mentioned in your question - raw count table (.txt).) You should read it into R before doing this.

Assuming you have named your table as 'count_table.txt' and first column is your gene names (or IDs), followed by rest of the 16 columns, organised according to sample names in above 'coldata'

countMatrix = read.delim("count_table.txt",sep='\t',row.names = 1)

then try running above code. 

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by poisonAlien2.5k

There was no error, but give some message below,

> dds = DESeqDataSetFromMatrix(countData = countMatrix, colData = coldata, design = ~ group + treatment + group:treatment)
it appears that the last variable in the design formula, 'treatment', has a factor level, 'control', which is not the base level. we recommend you use factor(...,levels=...) or relevel() to set this as the base level before proceeding. for more information, please see the 'Note on factor levels' in vignette('DESeq2').

ADD REPLYlink written 2.3 years ago by jhkim19720
1

Ahh ! I The warning because reference level is treated here. See the edit above.

ADD REPLYlink written 2.3 years ago by poisonAlien2.5k

It works! Thank you so much for your help!
 

ADD REPLYlink written 2.3 years ago by jhkim19720

I have got the result below,

Does this mean that the genotype4_treated is significantly different from others?

 

log2 fold change (MAP): groupgt4.treatmenttreated

Wald test p-value: groupgt4.treatmenttreated

DataFrame with 39595 rows and 6 columns

baseMean log2FoldChange lfcSE stat pvalue padj

<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>

unigene22398972 189645.38 -0.1773434 0.2744156 -0.6462584 0.518111997 0.999899

unigene22407675 179211.24 0.1429904 0.1045016 1.3683091 0.171215344 0.999899

unigene22402019 72765.83 -0.1883720 0.2830502 -0.6655075 0.505725927 0.999899

unigene22387256 70438.23 0.3585405 0.1246933 2.8753784 0.004035435 0.999899

unigene22410725 29215.47 0.2276120 0.1606212 1.4170736 0.156461409 0.999899

... ... ... ... ... ... ...

unigene22407834 0.07103343 -5.161589e-05 0.01173293 -0.004399233 0.9964899 NA

unigene22405341 0.07103343 -5.161589e-05 0.01173293 -0.004399233 0.9964899 NA

unigene22407238 0.07103343 -5.161589e-05 0.01173293 -0.004399233 0.9964899 NA

unigene22407247 0.07103343 -5.161589e-05 0.01173293 -0.004399233 0.9964899 NA

unigene22397119 0.07103343 -5.161589e-05 0.01173293 -0.004399233 0.9964899 NA

ADD REPLYlink written 2.3 years ago by jhkim19720
Sorry again, I am in trouble with an Error. Below is all scripts.

> library(DESeq2)
> library("DESeq2")
> countMatrix <- read.table ("~/Desktop/KiPBSvsPIC.txt", header=TRUE)
> head (countMatrix)
               ID Genotype1.1 Genotype1.2 Genotype2.1 Genotype2.2 Genotype3.1
1 unigene22398972      223736       97095      182659      161499      288360
2 unigene22407675      197279      218530      241506      194318      189604
3 unigene22402019      102051       50008       68617       60005       98462
4 unigene22387256       82289       93196       88090       69212       67281
5 unigene22410725       26403       51824       35820       36544       21938
6 unigene22400817       45626       20860       28781       24550       43365
  Genotype3.2 Genotype4.1 Genotype4.2 Genotype1.1_treated Genotype1.2_treated
1      280347      160611      239753              162563               51277
2      158650      184416      173256              173624              185385
3      138085       39193       79512               68575               28035
4       55019       62363       71076               54943               58875
5       22089       33376       21841               31780               27587
6       55484       15369       30763               26308               11168
  Genotype2.1_treated Genotype2.2_treated Genotype3.1_treated
1              286391               92844              243987
2              161664              154975              187180
3               93280               30827               88396
4               58179               54870               82084
5               32564               25289               27143
6               40633               13106               34436
  Genotype3.2_treated Genotype4.1_treated Genotype4.2_treated
1              314833              120738              105184
2              149823              173155              163640
3              136919               33087               27698
4               55364               99173               80765
5               18275               33866               26191
6               54273               13731               10517
> #form coldata
> coldata = data.frame(row.names = c('genotype1-1', 'genotype1-2', 'genotype2-1', 'genotype2-2', 'genotype3-1', 'genotype3-2', 'genotype4-1', 'genotype4-2','genotype1-1_treated', 'genotype1-2_treated', 'genotype2-1_treated', 'genotype2-2_treated', 'genotype3-1_treated', 'genotype3-2_treated', 'genotype4-1_treated', 'genotype4-2_treated'),group = rep(c("gt1","gt2","gt3","gt4"),2,each = 2),treatment = rep(c("control","treated"),each = 8))
> #make factors for treatment column.
> coldata$treatment = factor(x = coldata$treatment,levels = c('control','treated'))
> coldata
                    group treatment
genotype1-1           gt1   control
genotype1-2           gt1   control
genotype2-1           gt2   control
genotype2-2           gt2   control
genotype3-1           gt3   control
genotype3-2           gt3   control
genotype4-1           gt4   control
genotype4-2           gt4   control
genotype1-1_treated   gt1   treated
genotype1-2_treated   gt1   treated
genotype2-1_treated   gt2   treated
genotype2-2_treated   gt2   treated
genotype3-1_treated   gt3   treated
genotype3-2_treated   gt3   treated
genotype4-1_treated   gt4   treated
genotype4-2_treated   gt4   treated
> #construct DESeq object from input matrix (countdata)
> dds = DESeqDataSetFromMatrix(countData = countMatrix, colData = coldata, design = ~ group + treatment + group:treatment)
Error in validObject(.Object) :
  invalid class “SummarizedExperiment” object: 'colData' nrow differs from 'assays' ncol
Calls: DESeqDataSetFromMatrix ... .local -> new -> initialize -> initialize -> validObject
In addition: Warning message:
In sort(rownames(colData)) == sort(colnames(countData)) :
  longer object length is not a multiple of shorter object length
Error in validObject(.Object) :
  invalid class “SummarizedExperiment” object: 'colData' nrow differs from 'assays' ncol
Calls: DESeqDataSetFromMatrix ... .local -> new -> initialize -> initialize -> validObject
ADD REPLYlink written 2.3 years ago by jhkim19720

I can't say much about your results. But regarding the above error, your first column must start with counts. Here your first column is ID's, which should be rowanames. While reading the table, use row.names = 1, which makes first columns as rowanmes. Anyways here is it. 

library("DESeq2")

countMatrix = read.table("~/Desktop/KiPBSvsPIC.txt",header = T,row.names = 1)

coldata = data.frame(row.names = colnames(countMatrix),group = rep(c("gt1","gt2","gt3","gt4"),2,each = 2),treatment = rep(c("control","treated"),each = 8))

coldata$treatment = factor(x = coldata$treatment,levels = c('control','treated'))

dds = DESeqDataSetFromMatrix(countData = countMatrix, colData = coldata, design = ~ group + treatment + group:treatment)

dds = DESeq(dds)
res = results(dds)

#order results according adjusted p-value
res = res[order(res$padj),]

#get degs based on fdr < 0.1
degs = res[which(res$padj < 0.1),]
ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by poisonAlien2.5k

Thank you so much for your help again. I appreciate it.

ADD REPLYlink written 2.3 years ago by jhkim19720

Hi, I am just bit confused.. I run the following script on R version 3.2 and everything was fine:

conds<- c("kd1.1","kd1.2","kd2.1","kd2.2","ctrl1.1","ctrl1.2")
Design<-data.frame(condition=conds,row.names=paste(conds,rep(c(1,2),3),sep="_"))
dds <- DESeqDataSetFromMatrix(countData = assay(se),
                             colData = Design,
                             design = ~ condition)

however, when I updated R into 3.3 I start getting the following error message:

Error in FUN(X[[i]], ...) : 
  assay colnames() must be NULL or equal colData rownames()

I tried to follow many solutions but non of them worked .

can anyone help

thanks much

ADD REPLYlink modified 14 months ago • written 14 months ago by ta_awwad100

Late, but the error is reported in the manual of the new version. It seems a known issue after the update to 3.3: https://www.bioconductor.org/packages/devel/bioc/manuals/DESeq2/man/DESeq2.pdf "Note on the error message "assay colnames() must be NULL or equal colData rownames()": this means that the colnames of countData are different than the rownames of colData. Fix this with: colnames(countData) <- NULL"

ADD REPLYlink written 8 months ago by xmarti610
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1093 users visited in the last hour