Set point colour based on threshold in scatter plot
3.9 years ago
fi1d18 ★ 4.1k

Hi,

I have a set of samples and log10 total read counts of them in a data frame like below

log10_total_counts
A1            6.468503
A10           6.565213
A11           6.752139
A12           5.078598
A2            6.277342
A3            6.473411
>

For instance red dots have less than 1000000 counts and black dots have more thsn 1000000 counts.

How could I repreduce this plot please?

R plot
3.9 years ago
seidel 9.9k

Something like the following would work with base R functions:

# make some data
a <- data.frame(sample=paste0("A", 1:50), log10_total_counts=rnorm(50, 6))

# get x points
x <- 1:length(a$log10_total_counts) # create plot, drop x axis plot(x, a$log10_total_counts, ylim=c(0,8), pch=19, xaxt="n", xlab="sample")
# add dashed line and X-axis labels with rotation
abline(h=6, lty="dashed")
axis(1,1:50, labels=a$sample, las=3) # select points to color differently LessThan6 <- a$log10_total_counts < 6
points(x[LessThan6], a$log10_total_counts[LessThan6], pch=19, col="red") ADD COMMENT 7 Entering edit mode 3.9 years ago Use ggplot2. Something is missing in your example; the labels at the bottom do not match you data frame, and it's unclear how your sample dataframe would count up to values near 1000000. You might want to precompute/add a QC vector based on that to your dataframe; basically have your dataframe in a state that's ready to be plotted with minimal calculations. Here's an example; I generate a dataframe with normally distributed values centered on 6.0 (seemed close to what your example had). I plot a dotted line at 6; and anything below 6 becomes red. For the style, I directly took the function theme_Publication() from this link. Here's the code: library("ggplot2") # Themes for style theme_Publication <- function(base_size=14, base_family="helvetica") { ... } scale_fill_Publication <- function(...){ ... } scale_colour_Publication <- function(...){ ... } # Function to plot points as described plotReads <- function(DF) { ggplot(DF) + theme_Publication() + # Style it geom_point(mapping = aes(x = as.factor(1:nrow(DF)), # Sequentially y = log10_total_counts, # Y is the count values color = QC)) + # Color it by the QC vector in the dataframe scale_color_manual(values = c("PASS" = "black", "FAIL" = "red")) + # Tell it what colours to use. geom_hline(yintercept = 6.0, linetype="dotted") + # Add the dotted line at 6.0 xlab("Sample") + # Label the X ylab("log10(Total Reads)") + # Label the Y ylim(c(-0.2, 8.2)) # Extend the Y axis to be between this range. } DF = data.frame( log10_total_counts = rnorm(30, mean = 6, sd = 0.5), # Randomized data QC = as.factor(c( rep("FAIL",15), rep("PASS", 15) )) # A vector that is just half FAIL and half PASS ) plotReads(DF) DF$QC <- as.factor(ifelse(DF\$log10_total_counts < 6.0, "FAIL", "PASS")) # Now actually add some logic to what should be PASS or FAIL.
None of the code uses anything from gridExtra - is it being loaded for code within the _Publication functions?

You're right, it's not needed here. Edited, and thanks!

Note: I cleaned up a discussion that sprang off an unwarranted comment of mine. Apologies!

3.9 years ago

another plot in ggplot2 with simulated data:

library(ggplot2)

geom_point() +
scale_y_continuous(limits = c(0, 200))+
scale_color_manual(name="QC", values = c("red","darkgreen"))+
geom_hline(yintercept = 110, color="red")+
theme_bw()+
xlab("Sample")+
theme(axis.text.x = element_text(angle = 45,hjust = 1),
legend.position = "bottom")

BIOSTARS I am so lucky that I can ask for your help and obtain such a nice solutions THANK YOU