Question: RNA-Seq data Quality Assessment- BoxPlot Interpretation
1
9 weeks ago by
Aynur40
Aynur40 wrote:

Hello,

Here is the boxplot, I got for my RNA-Seq data.

My data is

``````head(rawCountTable)
con-1  con-2    a-1    a-2    b-1    b-2     c-1    c-2    d-1    d-2
ENSMUSG0000000000     0      0      0      0      0      0       0      0      0      0
ENSMUSG00000000028    854    937   1143   1029    912    856    809    754    513    520
ENSMUSG00000000031 822918 817451 716860 691396 763705 829274 838094 819312 717935 730879
``````

The code for Boxplot is below:

``````pseudoCount = log2(rawCountTable + 1)
df = melt(pseudoCount, variable.name = "Samples",
value.name = "count") # reshape the matrix
df = data.frame(df, Condition = substr(df\$Samples, 1, 4))
``````

Here is my code for the density plot.

``````ggplot(df, aes(x = count, colour = Samples, fill = Samples)) + ylim(c(0, 0.17)) + geom_density(alpha = 0.2, size = 1.25) + facet_wrap(~ Condition) + theme(legend.position = "top") + xlab(expression(log[2](count + 1)))
``````

The density Plot is

So, my question is I want to know how to interpret these plots? How is my data quality? If you can recommend me an article about understanding these plots and assess my data, I would appreciate it.

Thank you very much!

modified 9 weeks ago by rpolicastro2.0k • written 9 weeks ago by Aynur40

The image links are broken. Try hosting and embedding them by pressing the image button in the post.

I've fixed it. OP used the embed code in image direct link field.

2
9 weeks ago by
rpolicastro2.0k
rpolicastro2.0k wrote:

I don't think those plots are necessarily too informative about quality. If you want a general idea about the quality of the sequencing reads, use a program like FastQC. The alignment statistics from your aligner will then give you a good idea of the complexity of your library. If you plan on running differential expression on your data, you can generate PCA and heatmap plots, which will be a good first indicator of replicate concordance, and from those plots you can sometimes start seeing the difference between conditions. The DESeq2 is a good resource for making these plots.

Alright. I already had my FastQC, and STAR aligning. I was making these plots to see between sample distribution prior to DEG analysis with DESeq2. These plots are mentioned in tutorials, and I am not sure if it is needed or not.
If this is not informing me of anything I should be aware of, then I will continue making PCA, MA plots, and DEG plots. Thanks.