Question: Deseq2 pairwise comparision
0
gravatar for Bioinfonext
18 months ago by
Bioinfonext120
Korea
Bioinfonext120 wrote:

There are two line 216 and 218

Three development stages 5 WEEK (5W), 7W, 9W.

Three tissue: Ca, Co, Pa

each with 2 biological replicate.

With two biological replicate. I want to do differential gene expression analysis using DESeq2 so I tried these codes after reading about DESeq2: ,my aim is to do the pairwise comparison. how to make colData and design formula.

library("DESeq2")

countMatrix = read.table("read_count.22May.2017.new.txt",header=T,sep='\t',check.names=F)

head(countMatrix)

dim(countMatrix)

[1] 57894    35

Now I am not sure how to construct a DESeqDataSet:

dds <- DESeqDataSetFromMatrix(countData = countMatrix,

colData = colData,

design = ~ condition)
rna-seq • 1.2k views
ADD COMMENTlink modified 18 months ago by dr_bantz80 • written 18 months ago by Bioinfonext120
1
gravatar for dr_bantz
18 months ago by
dr_bantz80
dr_bantz80 wrote:

The 'colData' argument specifies the sample information. This should be a one column dataframe containing the condition for each sample, with the name of the samples as the row names.

colData <- data.frame(condition = conditions)

row.names(colData) <- names

where "conditions" is a vector of containing the condition for each sample and "names" is the name of each sample (in the same order of course!).

ADD COMMENTlink modified 18 months ago • written 18 months ago by dr_bantz80

I tried to make ColData like this:

ColData <- data.frame (genotypes = c(‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’,), development_stage = c(‘5W’, ‘5W’, ‘5W’, ‘5W’, ‘5W’, ‘5W’, ‘5W’, ‘7W’, ‘7W’, ‘7W’, ‘7W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘5W’, ‘5W’, ‘5W’, ‘5W’, ‘5W’, ‘7W’, ‘7W’, ‘7W’, ‘7W’, ‘7W’, ‘7W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’,) Tissue_type = c(‘Ca’, ‘Ca’, ‘Co’, ‘Co’, ‘Pa’, ‘Pa’, ‘Ca’, ‘Ca’, ‘Pa’, ‘Pa’, ‘Ca’, ‘Ca’, ‘Co’, ‘Co’, ‘Pa’, ‘Pa’, ‘Ca’, ‘Co’, ‘Co’, ‘Pa’, ‘Pa’, ‘Ca’, ‘Ca’, ‘Co’, ‘Co’, ‘Pa’, ‘Pa’,))

Because I have 2 genotypes, 3 development stage and 3 Tissue but getting some error:

Error: unexpected input in "ColData <- data.frame (genotypes = c(▒"’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’,), development_stage = c(‘5W’, ‘5W’,> 5W’, ‘5W’, ‘5W’, ‘5W’, ‘5W’, ‘7W’, ‘7W’, ‘7W’, ‘7W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘5W’, ‘5W’, ‘5W’, ‘5W’, ‘5W’, ‘7W’, ‘7W’, ‘7W’, ‘7W’, ‘7W’, ‘7W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’,) Tissue_type = c(‘Ca’, ‘Ca’, ‘Co’, ‘Co’, ‘Pa’, ‘Pa’, ‘Ca’, ‘Ca’, ‘Pa’, ‘Pa’, ‘Ca’, ‘Ca’, ‘Co’, ‘Co’, ‘Pa’, ‘Pa’, ‘Ca’, ‘Co’, ‘Co’, ‘Pa’, ‘Pa’, ‘Ca’, ‘Ca’, ‘Co’, ‘Co’, ‘Pa’, ‘Pa’,))
ADD REPLYlink modified 18 months ago • written 18 months ago by Bioinfonext120

I tried to type all condition on linux platform itself but again getting ERROR:

> colData <- data.frame(genotypes = c('216','216','216','216','216','216','216','216','216','216','216','216','216','216','216','216','218','218','218','218','218','218','218','218','218','218','218','218','218','218','218','218','218'), development_stage = c('5W','5W','5W','5W','5W','5W','7W','7W','7W','7W','9W','9W','9W','9W','9W','9W','5W','5W','5W','5W','5W','7W','7W','7W','7W','7W''7W','9W','9W',9W','9W','9W','9W','9W','9W'),Tissue_type = c('Ca','Ca','Co','Co','Pa','Pa','Ca','Ca','Pa','Pa','Ca','Ca','Co','Co','Pa','Pa','Ca','Co','Co','Pa','Pa','Ca','Ca','Co','Co','Pa','Pa','Ca','Ca','Co','Co','Pa','Pa'))

Error

Error: unexpected string constant in "8','218','218','218','218',......
ADD REPLYlink written 18 months ago by Bioinfonext120
1

You've missed an apostrophe and a comma in there and the variables in the data frame have different lengths (ie, one of them has the wrong number of samples).

ADD REPLYlink written 18 months ago by dr_bantz80

Thanks a lot for helping me. I read sampleinfo (colData) as a csv file like this:

SampleInfo<- read.csv("sampleInfo.csv", check.names=F)

I need to ask you one thing about biological replicate information. 216_5W_Ca1 and 216_5W_Ca2 are biological replicate.... How should I add information about these in sampleinfo...

head(SampleInfo)

                   Genotypes Development_stage    Tissue
216_5W_Ca1       216                5W                Ca
216_5W_Ca2       216                5W                Ca
216_5W_Co1       216                5W                Co
216_5W_Co2       216                5W                Co
216_5W_Pa1       216                5W                Pa
216_5W_Pa2       216                5W                Pa

and My counMatrix look like this:

head(countMatrix)

                    216_5W_Ca1   216_5W_Ca2   216_5W_Co1       216_5W_Co2         
1 Rs025080        100              71          0                      0                
2 Rs035250          0              0           0                     50 
3 Rs035280          0              0           0                      0

I also need to understand how to construct desion in DESeqDataSetFromMatrix for pairwise comparison ( 216_5W_Ca_VS_216_5W_Co) or multifactor to extract all differentially expressed genes across all the development and tissue stages above 2 fold and p value <0.001:

ds <- DESeqDataSetFromMatrix(countData = countMatrix,

colData = colData,

design = ~ condition)
ADD REPLYlink modified 18 months ago • written 18 months ago by Bioinfonext120
0
gravatar for igor
18 months ago by
igor7.1k
United States
igor7.1k wrote:

Did you check the DESeq2 vignette? There is a section on paired samples:

Yes, you should use a multi-factor design which includes the sample information as a term in the design formula. This will account for differences between the samples while estimating the effect due to the condition. The condition of interest should go at the end of the design formula, e.g. ~ subject + condition.

Source: https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#can-i-use-deseq2-to-analyze-paired-samples

ADD COMMENTlink written 18 months ago by igor7.1k

Thanks I read DESeq2 Vignette, but I am not able to understand.. what do you mean by pair end samples.....is it about lines like I have 216 and 218?

I am not able to understand multi factor designs. How do I make colData and desing formula in above command?

ADD REPLYlink modified 18 months ago • written 18 months ago by Bioinfonext120

"paired end" is to do with the technology used for the sequencing itself (I imagine you used single end - either way it's not relevant to your question).

The link igor posted gives some guidelines as to how to deal with having samples encompassing multiple variables (conditions/cell lines). You say you want to do pairwise comparisons between all different variable combinations. For this, you could just do a bunch of different pairwise comparisons separately with DESeq then use multiple testing correction (eg. Bonferroni) to adjust the p-values accordingly. However, this may be hard to interpret, and something like PCA or correlation heatmaps might be more useful.

Edit: Using the DESeq2 contrasts() function would be a good idea for the pairwise comparison.

ADD REPLYlink modified 18 months ago • written 18 months ago by dr_bantz80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1056 users visited in the last hour