Question: RNA-Seq analysis with DESeq2. Confusing results.
0
gravatar for Armin
4.7 years ago by
Armin0
United States
Armin0 wrote:

Hi,

I analyze RNA-Seq data from three sample groups (non-pathology (5 samples), mild pathology (7 samples), extreme pathology (5 samples)). My goal is to find differentially expressed genes between non and mild, non and extreme, and mild and extreme. Read counts were prepared using HTSeq. Then I do exactly the same steps as described in the DESeq2 manual.

> library("DESeq2")
> directory <- "my_path"
> sampleFiles <- grep("pathol",list.files(directory),value=TRUE)
> sampleCondition <- sub("(.*pathol).*","\\1",sampleFiles)
>  sampleTable <- data.frame(sampleName = sampleFiles,fileName = sampleFiles,condition = sampleCondition)
> ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,directory = directory,design= ~ condition)
> ddsHTSeq$condition <- factor(ddsHTSeq$condition,levels=c("nonpathol","extpathol"))
> ddsHTSeq$condition <- relevel(ddsHTSeq$condition, "nonpathol")
> ddsHTSeq<- DESeq(ddsHTSeq)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting generalized linear model
> res <- results(ddsHTSeq)

I perform this procedure for all three comparisons and in each separate comparison get seemingly biologically meaningful results. However, the problem is following:

Non vs Mild gives me 15 d.e. genes (padj <0.05)

Non vs Extreme - 1,249 d.e. genes

but 

Mild vs. Extreme - only 16 d.e. genes!

Basic logic tells me that IF Non group differs from Extreme group by many hundreds genes AND the same Non group almost doesn't differ from Mild group THEN Mild group also has to differ from Extreme by many genes.

Is such logic wrong? If not then I don't get where could I make any mistake in the analysis. 

 

ADD COMMENTlink modified 4.7 years ago by Damian Kao15k • written 4.7 years ago by Armin0
3
gravatar for Devon Ryan
4.7 years ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:

Your "logic" is completely incorrect, in fact it reminds me of one of Zeno's paradoxes.

An R example (values generated with rnorm):

a <- c(0.9964361, 1.2685663, 0.5132198, -1.6368091, 3.0478775)
b <- c(3.7843679, 3.4882500, 0.1275235, 3.2042165, 1.9144505)
c <- c(4.205894, 3.270352, 4.922832, 4.107070, 3.656648)

If one performs the pair-wise T-tests, one finds that a vs. b and b vs. c are insignificant, while a vs. c is significant. By your reasoning, that should be an impossibility, but it's certainly not.

ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by Devon Ryan88k

I already asked this question on SeqAnswers but can I ask you here too. In which function can I use the argument minReplicatesForReplace ? Both DESeq() and results() produce the following error message: 

Error in results(ddsHTSeq, minReplicatesForReplace = 5) : 
  unused argument (minReplicatesForReplace = 5)

ADD REPLYlink written 4.7 years ago by Armin0

It should work with the DESeq() function. What version are you using?

ADD REPLYlink written 4.7 years ago by Devon Ryan88k

DESeq2_1.0.19

command: ddsHTSeq<- DESeq(ddsHTSeq, minReplicatesForReplace = 5)

Here is my sessionInfo

> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] DESeq2_1.0.19             RcppArmadillo_0.4.300.8.0
[3] Rcpp_0.11.2               lattice_0.20-29          
[5] Biobase_2.20.1            GenomicRanges_1.12.5     
[7] IRanges_1.18.4            BiocGenerics_0.6.0       

loaded via a namespace (and not attached):
 [1] annotate_1.38.0      AnnotationDbi_1.22.6 DBI_0.2-7           
 [4] genefilter_1.42.0    grid_3.0.1           locfit_1.5-9.1      
 [7] RColorBrewer_1.0-5   RSQLite_0.11.4       splines_3.0.1       
[10] stats4_3.0.1         survival_2.37-7      tools_3.0.1         
[13] XML_3.98-1.1         xtable_1.7-3

 

ADD REPLYlink written 4.7 years ago by Armin0

You're using the version from Bioconductor 2.12, which is fairly outdated and that option didn't exist yet back then. You should just update the libraries on your local installation.

ADD REPLYlink written 4.7 years ago by Devon Ryan88k

Thank you, Devon!

ADD REPLYlink written 4.7 years ago by Armin0

I link your answer !

ADD REPLYlink written 2.9 years ago by jimmy_zeng90
1
gravatar for Damian Kao
4.7 years ago by
Damian Kao15k
USA
Damian Kao15k wrote:

Genes that are not significantly (whatever threshold you choose) differentially expressed aren't necessarily equally expressed. It just means your variance is too high to determine whether these genes are differentially expressed. 

Let's say your mild sample replicates vary highly compare to your non and extreme sample replicates. When you perform the differential expression test between mild and non/extreme, you would not be confident in whether the difference is due to variance or real effect. 

However, you are using a generalized linear model for your DE. I am not very familiar with how that works in relation to differential expression. But I think that's supposed to take into account different variances.

ADD COMMENTlink modified 4.7 years ago by Devon Ryan88k • written 4.7 years ago by Damian Kao15k

Stated somewhat differently, non-significance is not equivalence.

ADD REPLYlink written 4.7 years ago by Devon Ryan88k
1

Yeah that first sentence was pretty confusing. I changed it. 

ADD REPLYlink written 4.7 years ago by Damian Kao15k

Thank you for explanation, Devon and Damian! Also, I found another possible reason that adds to such results. In comparisons with Mild many genes were set to NA due to outliers. 

ADD REPLYlink written 4.7 years ago by Armin0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1882 users visited in the last hour