Hi all, I'm an undergrad learning how to process some data given to me by my mentor. This is bulk RNA-seq data, aligned by STAR and assigned using featureCounts. I pre-filtered using rowSums(counts(dds))>=10 and ran DESeq and set an alpha=0.05. However, I got a weird-looking MA graph with diagonal lines. I'm not too sure why it looks like that. The MA plot was plotted using plotMA(). Even weirder is that my PCA plot has PC1 with 98.684% variance and PC2 with 0.582% variance. I'm quite confused and not sure how to go from here. I already ran vst(dds,blind=FALSE) before using the data in the plotPCA() function. My biggest question is, how likely is that PC1 is able to actually capture 98% of the variation and have this not be due to some technical artefact? What can I do to make sure this is an actual biological difference? And if my PCA plot is incorrect, what steps can I take to check what went wrong?
I'm still learning and always happy to learn more but I am quite new to bulk RNA-seq so I would appreciate any guidance on what I can do to troubleshoot this. I am more than happy to provide any details about the data. Thank you.
It is very unlikely unless something bad (very bad) happened during samples preparation: extraction, library preparation and sequencing
Hi, sorry what do you mean by that? As my results are likely caused by something very bad happening during sample preparation?
Check the RNA quality (RIN); check if RNA is completely degraded in one group of samples compared to the other.
Ensure that the same library preparation and sequencing protocols have been used for all samples.
Explain the type of samples you are comparing. There could be a biological explanation for why PC1 in the PCA plot accounts for 98.684% of the variance, assuming all samples are independent biological replicates and not multiple sequencing runs of RNA extracted from just two samples (one per treatment).
One group is an immortalized cell line. The other is the same cell type, but derived from stem cells. I only have access to the fastq files. Is there any other way to check RNA quality?
You should ask the person who prepared the RNA samples
That could explain the large variance you see in PC1. However, you should discuss this with your mentor, as I am not an expert in this specific field. I primarily work with RNA-seq on bacteria.
Perhaps this task was given to you simply as an exercise to learn how to process RNA-seq data.
According to the MA plot, maybe try set a higher threshold for filtering? i.e. your
rowSums(counts(dds))>=10
partHmm, I tried 20, 30. Still had the same result.