I am trying to find the best way to make 2 boxplot for a specific gene from data found in a row for a subset of columns within data frame x.
x dimensions are 634 by 128 columns
Each row is specific to a gene,
Column 1 has gene name, and I want to say look at gene in row#1
columns 2:48 data I want to include in one boxplot
columns 49:128 data i want to include in another boxplot
data fram looks something like this
gene accepted_hits_x1.bam accepted_hits_x1.bam etc....
1 AARS1 -6 0 etc....
I also want to be able to see each data point that makes up the boxplot plotted in the plot
I am having a problem:
I am running into the problem where my data ( residual from mean ... meaning x value - mean) is a series of positive and negative values and it appears that with this plot it is excluding these negative values...
data <- unlist(subset(datavr, gene =="IGF1R", select=2:128))
news <- data.frame(data=data, factor=c(rep(1,47), rep(2,80)))
news$data <- (log10(as.numeric(news$data)) + 1)
g <- ggplot(data=news, aes(x=as.factor(factor), y=data))
g + geom_boxplot() + geom_point(color="purple", size=3) + xlab("A38-41 A38-5 ") + ylab("log10(Residual from Mean)+1") + ggtitle("IGF1R inside region") + theme(plot.title = element_text(face="bold"))
The problem is that it keeps giving me error saying tha:
Removed 110 rows containing missing values (geom_point).
This could be that these values are negative so taking the log10(value)+1?