Question: RNA-Seq data Quality Assessment- BoxPlot Interpretation
gravatar for Aynur
9 weeks ago by
Aynur40 wrote:


Please help me with understanding my boxplot.

Here is the boxplot, I got for my RNA-Seq data.

Box Plot

My data is

                   con-1  con-2    a-1    a-2    b-1    b-2     c-1    c-2    d-1    d-2
ENSMUSG0000000000     0      0      0      0      0      0       0      0      0      0
ENSMUSG00000000028    854    937   1143   1029    912    856    809    754    513    520
ENSMUSG00000000031 822918 817451 716860 691396 763705 829274 838094 819312 717935 730879

The code for Boxplot is below:

pseudoCount = log2(rawCountTable + 1)
df = melt(pseudoCount, = "Samples", = "count") # reshape the matrix 
df = data.frame(df, Condition = substr(df$Samples, 1, 4))

Here is my code for the density plot.

ggplot(df, aes(x = count, colour = Samples, fill = Samples)) + ylim(c(0, 0.17)) + geom_density(alpha = 0.2, size = 1.25) + facet_wrap(~ Condition) + theme(legend.position = "top") + xlab(expression(log[2](count + 1)))

The density Plot is

Density Plot

So, my question is I want to know how to interpret these plots? How is my data quality? If you can recommend me an article about understanding these plots and assess my data, I would appreciate it.

Thank you very much!

ADD COMMENTlink modified 9 weeks ago by rpolicastro2.0k • written 9 weeks ago by Aynur40

The image links are broken. Try hosting and embedding them by pressing the image button in the post.

ADD REPLYlink written 9 weeks ago by rpolicastro2.0k

I've fixed it. OP used the embed code in image direct link field.

ADD REPLYlink written 9 weeks ago by RamRS30k
gravatar for rpolicastro
9 weeks ago by
rpolicastro2.0k wrote:

I don't think those plots are necessarily too informative about quality. If you want a general idea about the quality of the sequencing reads, use a program like FastQC. The alignment statistics from your aligner will then give you a good idea of the complexity of your library. If you plan on running differential expression on your data, you can generate PCA and heatmap plots, which will be a good first indicator of replicate concordance, and from those plots you can sometimes start seeing the difference between conditions. The DESeq2 is a good resource for making these plots.

ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by rpolicastro2.0k

Alright. I already had my FastQC, and STAR aligning. I was making these plots to see between sample distribution prior to DEG analysis with DESeq2. These plots are mentioned in tutorials, and I am not sure if it is needed or not.
If this is not informing me of anything I should be aware of, then I will continue making PCA, MA plots, and DEG plots. Thanks.

ADD REPLYlink written 9 weeks ago by Aynur40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1755 users visited in the last hour