I've been reading up and searching on forums for something that could help me interpret the fold change plots I am getting after performing differential expression analysis on my data. I think that was worse because I think I've made myself start second guessing my analysis, so I've decided to ask for some other opinions. I'll have to be a bit vague about the specific treatments and organism because of NDA agreements with the financing entity, but otherwise I think it should be fine.
Some context: we have two treatments for our organism (B and C) and are analyzing the expression in four different tissues (G, M1, M2, M3), and we did this in triplicate, for a grand total of 24 samples. Since our organism is not sequenced, we had to do de novo assembly with Trinity on pooled read data to create a reference (around ~560 million reads). We have around 300,000 assembled transcripts, but calculating ExN50 and TPM, less than 35,000 transcripts account for about 96% of the data in contigs with an N50 of almost 2 kb. The vast majority of assembled transcripts have 5 or less assigned reads.
I used Salmon to calculate abundance for all samples against the reference, and then used TxImport to pass the data for analysis with DESeq2. For DESeq2, the two factor combinatorial design was built into ~group so we could later extract contrasts between tissues within one treatment, or between equivalent tissues between treatments.
dds <- DESeqDataSetFromTximport(txi.salmon, colData=colDesign, design= ~tissue+treatment) dds$group <- factor(paste0(dds$treatment,dds$tissue)) design(dds) <- ~group de_results <- DESeq(dds, parallel=TRUE, BPPARAM=MulticoreParam(2))
After this, I've been doing some exploratory analysis. First, PCA analysis after looks good. Samples from different tissues cluster away from one another (quite drastically in G vs the M tissues), and there is some smaller separation in the same tissue with different treatment. So far so good.
However, when I start looking at the volcano plots or MA plots, they look rather atypical. It seems that the nature of the data does not allow to detect small fold changes. The vertical lines mark -2 and 2 fold. All the changes determined as significant are of a large magnitude. Even the glut of transcripts with very few reads that show up in the MA plot tend to have large fold changes that are not significant. I don't know if this should worry me. Also, some comparisons that have this odd vertical line of significant values. I've normally gotten volcano plots with data more spread out, this feels like an artefact.
Any tips as to something that might be wrong when setting up the DESeq2 design, or ideas about the data itself are quite welcome.