Hello all,
I recently encountered a problem where I have an unusual number of downregulated genes compared to the upregulated genes. I wasn't sure if this is a problem or if this can happen. I used DESeq2 for the DE analysis. The data was normalized based on housekeeping genes. Do I have to change the design parameter in the DESeqDataSetFromMatrix() function? For the current iteration, the design model was ~condition. Will I have to include the pooling information in the design (~condition+pool)? Thanks in advance!
Experimental design:
- Pool A - two controls and three experimental groups
- Pool B - three controls and three experimental groups
6 samples (biological replicates)
- Pool A - positive
- Pool A - positive
- Pool A - negative
- Pool A - negative
- Pool A - negative
6 samples (biological replicates)
- Pool B - positive
- Pool B - positive
- Pool B - positive
- Pool B - negative
- Pool B - negative
- Pool B - negative
DE comparison - positive vs negative (pools A and B together)
You can try to prefilter this a bit. That cloud bottomleft has low baseMeans and large fold changes so this is probably genes with many zeros that are rather unreliable. An automated way would be the
edgeRfunctionfilterByExpr(). Including pool is only necessary if the pools are driving any separation. Check the PCA for it. Is this normal RNA-seq or what?Thank you for the recommendation! This was a low input bulk RNA-seq
Ok, so "standard" RNA-seq. Yeah, I would really try to prefilter a bit and also inspect the PCA. See DESeq2 manual, there is code for PCA in it.
I did do some pre-filtering before I ran DESeq. Will it be because of low read counts in some of the samples? I did some QCs today and I've attached the plots here
I cannot say that these plots are very informative. Run a PCA.
strongly agree. this is a data cleaning issue.