Question: No differentially expressed genes using DESeq2
0
gravatar for Sumit Paliwal
2.4 years ago by
Sumit Paliwal20 wrote:

After generating the counts with htseq, I performed differential expression analysis on RNA-seq data using DESeq2 package in R. Following is the metadata table:

##   sampleName               fileName  Tissue Gender Condition
## 1     A-C-F3   A-C-F3_tophat.counts Adipose Female   control
## 2    A-C-M5R  A-C-M5R_tophat.counts Adipose   Male   control
## 3    A-C-M6R  A-C-M6R_tophat.counts Adipose   Male   control
## 4    A-VR-F6  A-VR-F6_tophat.counts Adipose Female        VR
## 5   A-VR-M4R A-VR-M4R_tophat.counts Adipose   Male        VR
## 6   A-VR-M6R A-VR-M6R_tophat.counts Adipose   Male        VR

The differential expression analysis pipeline and results obtained are as follows:

cds <- DESeqDataSetFromHTSeqCount(AdiposeTable, directory = directory, design = ~Condition)
nrow(cds)
## [1] 32754
cds_filtered <- cds[rowSums(counts(cds)) >2,]
nrow(cds_filtered)
## [1] 26419
(Adipose_deseq2 <- DESeq(cds_filtered))
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
## class: DESeqDataSet 
## dim: 26419 6 
## metadata(1): version
## assays(3): counts mu cooks
## rownames(26419): ENSRNOG00000000001 ENSRNOG00000000007 ...
##   ENSRNOG00000062172 ENSRNOG00000062173
## rowData names(27): baseMean baseVar ... deviance maxCooks
## colnames(6): A-C-F3 A-C-M5R ... A-VR-M4R A-VR-M6R
## colData names(4): Tissue Gender Condition sizeFactor

(res <- results(Adipose_deseq2))
## log2 fold change (MAP): Condition VR vs control 
## Wald test p-value: Condition VR vs control 
## DataFrame with 26419 rows and 6 columns
##                     baseMean log2FoldChange     lfcSE       stat
##                    <numeric>      <numeric> <numeric>  <numeric>
## ENSRNOG00000000001  3.087711      0.5933622 0.5882185  1.0087445
## ENSRNOG00000000007 13.046054     -1.0205183 0.5900124 -1.7296556
## ENSRNOG00000000008  4.292353      0.6300289 0.5990561  1.0517028
## ENSRNOG00000000009  2.046352      0.1457463 0.5186478  0.2810121
## ENSRNOG00000000010  4.050717     -0.2227494 0.5947688 -0.3745143
## ...                      ...            ...       ...        ...
## ENSRNOG00000062166  0.477197      0.1842581 0.3415356  0.5394990
## ENSRNOG00000062168  2.401384     -0.1626328 0.5929979 -0.2742552
## ENSRNOG00000062170 40.883003      0.4688201 0.4455042  1.0523361
## ENSRNOG00000062172  2.881156      0.2942357 0.5979495  0.4920745
## ENSRNOG00000062173  1.198567      0.1375545 0.4842101  0.2840802
##                        pvalue      padj
##                     <numeric> <numeric>
## ENSRNOG00000000001 0.31309717 0.9998246
## ENSRNOG00000000007 0.08369182 0.9998246
## ENSRNOG00000000008 0.29293595 0.9998246
## ENSRNOG00000000009 0.77870109 0.9998246
## ENSRNOG00000000010 0.70802171 0.9998246
## ...                       ...       ...
## ENSRNOG00000062166  0.5895426 0.9998246
## ENSRNOG00000062168  0.7838885 0.9998246
## ENSRNOG00000062170  0.2926454 0.9998246
## ENSRNOG00000062172  0.6226667 0.9998246
## ENSRNOG00000062173  0.7763489 0.9998246
resSig <- res[which(res$padj <0.1),]
head(resSig[order(resSig$log2FoldChange, decreasing = TRUE),])
## log2 fold change (MAP): Condition VR vs control 
## Wald test p-value: Condition VR vs control 
## DataFrame with 0 rows and 6 columns

The analysis does not identify any gene to be differentially expressed. I am not able to figure out what could be the possible reason. Kindly suggest. The data is generated from Rat tissue samples.

rna-seq deseq2 • 2.0k views
ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by Sumit Paliwal20
1

have you tried controlling for sex

ADD REPLYlink written 2.4 years ago by russhh4.3k

No. I didn't do that. I'll do that and post the results. I forgot to add in the question that this data is from rat tissue samples.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by Sumit Paliwal20

did it change anythng?

ADD REPLYlink written 2.4 years ago by russhh4.3k

It did. But I found only six genes being differentially expressed (DE) after controlling for Gender. When I performed the analysis with edgeR, Six genes were found to be DE in simple design but after contolling for Gender, the total number of DE genes were 19.

ADD REPLYlink written 2.4 years ago by Sumit Paliwal20

Did you look at the PCA plots of your samples?

ADD REPLYlink written 2.4 years ago by VHahaut1.1k

Yes I did. In fact, analysis for the same data using edgeR pipeline (including and excluding outlier sample) produced a small list of differentially expressed genes.

ADD REPLYlink written 2.4 years ago by Sumit Paliwal20

Are your samples well clustered and the groups separated on the PCA? Could you post it?

EdgeR and DESeq2 do not analyze the data in the same way so it is not surprising to find differences.

ADD REPLYlink written 2.4 years ago by VHahaut1.1k

Here they are: PCA plot Cluster Plot

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by Sumit Paliwal20

You really need a gender factor in your model! Also, I suspect A-C-M6R is female (though it seems to be a bit of an outlier anyway).

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by Devon Ryan89k

One more important thing here is that one of the control sample (A-C-M5R) is clustering with the treated (VR) group. Also, One of the VR group sample (A-VR-F6) is clustering with the controls. Do you think this is messing up the results? I should then remove these samples and redo the analysis.

ADD REPLYlink written 2.4 years ago by Sumit Paliwal20

Can you check the expression values (take the mean of control vs take the mean of VR) and calculate log2 fold change for all genes, and see how many are differentially expressed?It could give an idea if the results from DEseq are true or not.

ADD REPLYlink written 2.4 years ago by Ron920

Can you explain in more detail how it would really help.

ADD REPLYlink written 2.4 years ago by Sumit Paliwal20

log2 fold change won't give you any info about the number of differntialy expressed genes; determination of differntial expression requires a statistical test rather than just the calculation of a statistic. You could use an alternative package like voom or others. But if there is any biological difference, and your experiment is sufficiently powered to detect it, DESeq2 is probably your best option at the moment

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by russhh4.3k

Ofcourse it requires a statistical test,but it can be used a sanity check " to see as if there are any differentially expressed genes or not" ,and if there are then the data indeed has differentially expressed genes and something wrong in the procedure while using DEseq2 or EdgeR.

ADD REPLYlink written 2.4 years ago by Ron920
1

Hi Ron, I am following the protocol suggested in the following publication: Anders S et.al. Nature Protocols 2013

ADD REPLYlink written 2.4 years ago by Sumit Paliwal20

How about read duplication / quality; do these differ significantly between the samples?

ADD REPLYlink written 2.4 years ago by russhh4.3k

Hi russhh Thanks for pointing this out. We have nearly 50 million 75bp paired-end raw reads for most samples. After alignment, I observed that samples that have clustered together on the left side in the dendrogram have very high percentage (70-80%) of uniquely mapped reads compared with those clustered on the right side that have only (20-30%) uniquely mapped reads. FASTQC analysis did not show adapter contamination in any of the samples.

ADD REPLYlink written 2.4 years ago by Sumit Paliwal20

Maybe there's no DEG. Actually, seems like the most probably interpretation. Not unheard of. For example, shuffle your samples to generate random data, then rerun the analysis. In theory, the number of significant raw pvals should be about 5% of your total test set.

ADD REPLYlink written 2.1 years ago by mforde841.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 680 users visited in the last hour