DESeq2 analysis with covariates with and without log transformation
1
0
Entering edit mode
3.8 years ago
haroonnaim • 0

I am analysing my RNA-Seq data with DESeq2.

  1. First, DESeq2 analysis with covariates (e.g., gender, age, Batch-ID, etc.) was performed and 2,000+ genes were found to be differentially expressed between two groups.
  2. Added PCT_RIBOSOMAL_BASES (fraction of reads mapped to ribosomal regions) as a covariate to the covariate list and then run the DESeq2, method displays the following message and result table shows few genes were found to be differentially expressed.

  3. Finally log10-transformed the PCT_RIBOSOMAL_BASES values and then run the DESeq2, method finished without any warning and result table gives 2,000+ differentially expressed genes between two groups.

I was wondering what was the reason in difference between step 2 and step 3 results. What causes the DESeq2 to perform so differently from step1 to step2.

PCT_RIBOSOMAL_BASES (i.e., 6 records values: 0.000002659168 0.000002308961 0.000004758736 0.000005656651 0.000002397121 0.000003582599)

Any help will be appreciated.

(DESeq2 message with step 2)

estimating size factors estimating dispersions gene-wise dispersion estimates: 32 workers mean-dispersion relationship -- note: fitType='parametric', but the dispersion trend was not well captured by the function: y = a/x + b, and a local regression fit was automatically substituted. specify fitType='local' or 'mean' to avoid this message next time.

RNA-Seq gene • 4.0k views
ADD COMMENT
0
Entering edit mode

Thanks Michael, highly appreciated.

The source code is:

## 
dds <-DESeq2::DESeqDataSetFromMatrix(countData = gene_counts, colData= coldata, design = ~condition + Gender  + Age 
                                     + PCT_INTRONIC_BASES +MEDIAN_CV_COVERAGE +  PCT_UTR_BASES  
                                     + MEDIAN_3PRIME_BIAS+ MEDIAN_5PRIME_BIAS + GC_content +   PCT_RIBOSOMAL_BASES
                                        )

#Differential expression analysis
dds <-DESeq2::DESeq(dds, parallel=TRUE)
res <- results(dds, contrast=c("condition",cond1,cond2), parallel = TRUE) 


#head(coldata)
         sample condition Gender Age PCT_INTRONIC_BASES MEDIAN_CV_COVERAGE PCT_UTR_BASES PCT_RIBOSOMAL_BASES MEDIAN_3PRIME_BIAS MEDIAN_5PRIME_BIAS
 1       COND1      1  43         0.07361633           0.451226     0.4023597      0.000002659168          0.5700870          0.4364080
 2       COND2     1  48         0.05082367           0.475164     0.3846830      0.000002308961          0.5241287          0.4699757
 3       COND2      2  48         0.05943800           0.476946     0.3981933      0.000004758736          0.5370023          0.4240317
ADD REPLY
1
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

ADD REPLY
1
Entering edit mode
3.8 years ago
Michael Love ★ 2.6k

For future reference, the Bioconductor support website is:

https://support.bioconductor.org

It is listed from the DESeq2 vignette and the man pages. But I will reply here to make it simpler.

I don't usually add these types of numeric covariates to the design with DESeq2. I will use RUV or SVA if I think that there is systematic biases affecting many genes. (See vignette for example code).

You didn't include your actual code, so I can only guess what happened:

My guess is that you are testing the PCT_RIBOSOMAL_BASES effect instead of the biological condition. See the DESeq2 vignette about multi-factor designs.

But otherwise, if this is not the case, transforming and scaling the covariates is an important step in proper linear modeling. It is recommended to transform as needed to avoid extreme skew in the numeric covariates and also to center and scale numeric covariates.

ADD COMMENT
0
Entering edit mode

I was wondering if there is any limitation or range regrading the values for numerical covariates in DESEQ2.

ADD REPLY
2
Entering edit mode

I recommend to transform and scale numeric covariates (current DESeq2 will warn you about this). There is no hard-coded limitation (the program will run, with a message recommending you to scale the covariate), but if you put in highly skewed covariates that aren't scaled to a linear model, you may get unstable or confusing results.

ADD REPLY
0
Entering edit mode

Thank you for your reply

ADD REPLY

Login before adding your answer.

Traffic: 2367 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6