Question: RNA-Seq data Quality Assessment- BoxPlot Interpretation
1
gravatar for Aynur
9 weeks ago by
Aynur40
Aynur40 wrote:

Hello,

Please help me with understanding my boxplot.

Here is the boxplot, I got for my RNA-Seq data.

Box Plot

My data is

head(rawCountTable)
                   con-1  con-2    a-1    a-2    b-1    b-2     c-1    c-2    d-1    d-2
ENSMUSG0000000000     0      0      0      0      0      0       0      0      0      0
ENSMUSG00000000028    854    937   1143   1029    912    856    809    754    513    520
ENSMUSG00000000031 822918 817451 716860 691396 763705 829274 838094 819312 717935 730879

The code for Boxplot is below:

pseudoCount = log2(rawCountTable + 1)
df = melt(pseudoCount, variable.name = "Samples", 
      value.name = "count") # reshape the matrix 
df = data.frame(df, Condition = substr(df$Samples, 1, 4))

Here is my code for the density plot.

ggplot(df, aes(x = count, colour = Samples, fill = Samples)) + ylim(c(0, 0.17)) + geom_density(alpha = 0.2, size = 1.25) + facet_wrap(~ Condition) + theme(legend.position = "top") + xlab(expression(log[2](count + 1)))

The density Plot is

Density Plot

So, my question is I want to know how to interpret these plots? How is my data quality? If you can recommend me an article about understanding these plots and assess my data, I would appreciate it.

Thank you very much!

ADD COMMENTlink modified 9 weeks ago by rpolicastro2.0k • written 9 weeks ago by Aynur40

The image links are broken. Try hosting and embedding them by pressing the image button in the post.

ADD REPLYlink written 9 weeks ago by rpolicastro2.0k

I've fixed it. OP used the embed code in image direct link field.

ADD REPLYlink written 9 weeks ago by RamRS30k
2
gravatar for rpolicastro
9 weeks ago by
rpolicastro2.0k
rpolicastro2.0k wrote:

I don't think those plots are necessarily too informative about quality. If you want a general idea about the quality of the sequencing reads, use a program like FastQC. The alignment statistics from your aligner will then give you a good idea of the complexity of your library. If you plan on running differential expression on your data, you can generate PCA and heatmap plots, which will be a good first indicator of replicate concordance, and from those plots you can sometimes start seeing the difference between conditions. The DESeq2 is a good resource for making these plots.

ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by rpolicastro2.0k

Alright. I already had my FastQC, and STAR aligning. I was making these plots to see between sample distribution prior to DEG analysis with DESeq2. These plots are mentioned in tutorials, and I am not sure if it is needed or not.
If this is not informing me of anything I should be aware of, then I will continue making PCA, MA plots, and DEG plots. Thanks.

ADD REPLYlink written 9 weeks ago by Aynur40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1755 users visited in the last hour