Question

No differentially expressed genes using DESeq2

0

Entering edit mode

7.9 years ago

Sumit Paliwal ▴ 40

After generating the counts with htseq, I performed differential expression analysis on RNA-seq data using DESeq2 package in R. Following is the metadata table:

##   sampleName               fileName  Tissue Gender Condition
## 1     A-C-F3   A-C-F3_tophat.counts Adipose Female   control
## 2    A-C-M5R  A-C-M5R_tophat.counts Adipose   Male   control
## 3    A-C-M6R  A-C-M6R_tophat.counts Adipose   Male   control
## 4    A-VR-F6  A-VR-F6_tophat.counts Adipose Female        VR
## 5   A-VR-M4R A-VR-M4R_tophat.counts Adipose   Male        VR
## 6   A-VR-M6R A-VR-M6R_tophat.counts Adipose   Male        VR

The differential expression analysis pipeline and results obtained are as follows:

cds <- DESeqDataSetFromHTSeqCount(AdiposeTable, directory = directory, design = ~Condition)
nrow(cds)
## [1] 32754
cds_filtered <- cds[rowSums(counts(cds)) >2,]
nrow(cds_filtered)
## [1] 26419
(Adipose_deseq2 <- DESeq(cds_filtered))
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
## class: DESeqDataSet 
## dim: 26419 6 
## metadata(1): version
## assays(3): counts mu cooks
## rownames(26419): ENSRNOG00000000001 ENSRNOG00000000007 ...
##   ENSRNOG00000062172 ENSRNOG00000062173
## rowData names(27): baseMean baseVar ... deviance maxCooks
## colnames(6): A-C-F3 A-C-M5R ... A-VR-M4R A-VR-M6R
## colData names(4): Tissue Gender Condition sizeFactor

(res <- results(Adipose_deseq2))
## log2 fold change (MAP): Condition VR vs control 
## Wald test p-value: Condition VR vs control 
## DataFrame with 26419 rows and 6 columns
##                     baseMean log2FoldChange     lfcSE       stat
##                    <numeric>      <numeric> <numeric>  <numeric>
## ENSRNOG00000000001  3.087711      0.5933622 0.5882185  1.0087445
## ENSRNOG00000000007 13.046054     -1.0205183 0.5900124 -1.7296556
## ENSRNOG00000000008  4.292353      0.6300289 0.5990561  1.0517028
## ENSRNOG00000000009  2.046352      0.1457463 0.5186478  0.2810121
## ENSRNOG00000000010  4.050717     -0.2227494 0.5947688 -0.3745143
## ...                      ...            ...       ...        ...
## ENSRNOG00000062166  0.477197      0.1842581 0.3415356  0.5394990
## ENSRNOG00000062168  2.401384     -0.1626328 0.5929979 -0.2742552
## ENSRNOG00000062170 40.883003      0.4688201 0.4455042  1.0523361
## ENSRNOG00000062172  2.881156      0.2942357 0.5979495  0.4920745
## ENSRNOG00000062173  1.198567      0.1375545 0.4842101  0.2840802
##                        pvalue      padj
##                     <numeric> <numeric>
## ENSRNOG00000000001 0.31309717 0.9998246
## ENSRNOG00000000007 0.08369182 0.9998246
## ENSRNOG00000000008 0.29293595 0.9998246
## ENSRNOG00000000009 0.77870109 0.9998246
## ENSRNOG00000000010 0.70802171 0.9998246
## ...                       ...       ...
## ENSRNOG00000062166  0.5895426 0.9998246
## ENSRNOG00000062168  0.7838885 0.9998246
## ENSRNOG00000062170  0.2926454 0.9998246
## ENSRNOG00000062172  0.6226667 0.9998246
## ENSRNOG00000062173  0.7763489 0.9998246
resSig <- res[which(res$padj <0.1),]
head(resSig[order(resSig$log2FoldChange, decreasing = TRUE),])
## log2 fold change (MAP): Condition VR vs control 
## Wald test p-value: Condition VR vs control 
## DataFrame with 0 rows and 6 columns

The analysis does not identify any gene to be differentially expressed. I am not able to figure out what could be the possible reason. Kindly suggest. The data is generated from Rat tissue samples.

RNA-Seq deseq2 • 6.6k views

ADD COMMENT • link 7.9 years ago by Sumit Paliwal ▴ 40

1

Entering edit mode

have you tried controlling for sex

ADD REPLY • link 7.9 years ago by russhh 5.7k

0

Entering edit mode

No. I didn't do that. I'll do that and post the results. I forgot to add in the question that this data is from rat tissue samples.

ADD REPLY • link 7.9 years ago by Sumit Paliwal ▴ 40

0

Entering edit mode

did it change anythng?

ADD REPLY • link 7.9 years ago by russhh 5.7k

0

Entering edit mode

It did. But I found only six genes being differentially expressed (DE) after controlling for Gender. When I performed the analysis with edgeR, Six genes were found to be DE in simple design but after contolling for Gender, the total number of DE genes were 19.

ADD REPLY • link 7.9 years ago by Sumit Paliwal ▴ 40

0

Entering edit mode

Did you look at the PCA plots of your samples?

ADD REPLY • link 7.9 years ago by VHahaut ★ 1.2k

0

Entering edit mode

Yes I did. In fact, analysis for the same data using edgeR pipeline (including and excluding outlier sample) produced a small list of differentially expressed genes.

ADD REPLY • link 7.9 years ago by Sumit Paliwal ▴ 40

0

Entering edit mode

Are your samples well clustered and the groups separated on the PCA? Could you post it?

EdgeR and DESeq2 do not analyze the data in the same way so it is not surprising to find differences.

ADD REPLY • link 7.9 years ago by VHahaut ★ 1.2k

0

Entering edit mode

Here they are: PCA plot Cluster Plot

ADD REPLY • link 7.9 years ago by Sumit Paliwal ▴ 40

0

Entering edit mode

You really need a gender factor in your model! Also, I suspect A-C-M6R is female (though it seems to be a bit of an outlier anyway).

ADD REPLY • link 7.9 years ago by Devon Ryan 104k

0

Entering edit mode

One more important thing here is that one of the control sample (A-C-M5R) is clustering with the treated (VR) group. Also, One of the VR group sample (A-VR-F6) is clustering with the controls. Do you think this is messing up the results? I should then remove these samples and redo the analysis.

ADD REPLY • link 7.9 years ago by Sumit Paliwal ▴ 40

0

Entering edit mode

Can you check the expression values (take the mean of control vs take the mean of VR) and calculate log2 fold change for all genes, and see how many are differentially expressed?It could give an idea if the results from DEseq are true or not.

ADD REPLY • link 7.9 years ago by Ron ★ 1.2k

0

Entering edit mode

Can you explain in more detail how it would really help.

ADD REPLY • link 7.9 years ago by Sumit Paliwal ▴ 40

0

Entering edit mode

log2 fold change won't give you any info about the number of differntialy expressed genes; determination of differntial expression requires a statistical test rather than just the calculation of a statistic. You could use an alternative package like voom or others. But if there is any biological difference, and your experiment is sufficiently powered to detect it, DESeq2 is probably your best option at the moment

ADD REPLY • link 7.9 years ago by russhh 5.7k

0

Entering edit mode

Ofcourse it requires a statistical test,but it can be used a sanity check " to see as if there are any differentially expressed genes or not" ,and if there are then the data indeed has differentially expressed genes and something wrong in the procedure while using DEseq2 or EdgeR.

ADD REPLY • link 7.9 years ago by Ron ★ 1.2k

1

Entering edit mode

Hi Ron, I am following the protocol suggested in the following publication: Anders S et.al. Nature Protocols 2013

ADD REPLY • link 7.9 years ago by Sumit Paliwal ▴ 40

0

Entering edit mode

How about read duplication / quality; do these differ significantly between the samples?

ADD REPLY • link 7.9 years ago by russhh 5.7k

0

Entering edit mode

Hi russhh Thanks for pointing this out. We have nearly 50 million 75bp paired-end raw reads for most samples. After alignment, I observed that samples that have clustered together on the left side in the dendrogram have very high percentage (70-80%) of uniquely mapped reads compared with those clustered on the right side that have only (20-30%) uniquely mapped reads. FASTQC analysis did not show adapter contamination in any of the samples.

ADD REPLY • link 7.9 years ago by Sumit Paliwal ▴ 40

0

Entering edit mode

Maybe there's no DEG. Actually, seems like the most probably interpretation. Not unheard of. For example, shuffle your samples to generate random data, then rerun the analysis. In theory, the number of significant raw pvals should be about 5% of your total test set.

ADD REPLY • link 7.6 years ago by mforde84 ★ 1.4k