Question: Make 2 boxplot from a data frame by plotting values in 1 row with different columns per box plot
1
gravatar for bgraphit
3.8 years ago by
bgraphit20
United States
bgraphit20 wrote:

Hi everyone!

I am trying to find the best way to make 2 boxplot for a specific gene from data found in a row for a subset of columns within data frame x.

x dimensions are  634 by 128 columns

Each row is specific to a gene,

Column 1 has gene name, and I want to say look at gene in row#1

columns 2:48 data I want to include in one boxplot

columns 49:128 data i want to include in another boxplot

 

data fram looks something like this

      gene       accepted_hits_x1.bam      accepted_hits_x1.bam    etc....

 1      AARS1          -6                            0             etc....

 

I also want to be able to see each data point that makes up the boxplot plotted in the plot

I am having a problem:

I am running into the problem where my  data ( residual from mean ... meaning x value - mean) is a series of positive and negative values and it appears that with this plot it is excluding these negative values...

 

data <- unlist(subset(datavr, gene =="IGF1R", select=2:128))

news <- data.frame(data=data, factor=c(rep(1,47), rep(2,80)))
news$data <- (log10(as.numeric(news$data)) + 1)

g <- ggplot(data=news, aes(x=as.factor(factor), y=data))

g + geom_boxplot() + geom_point(color="purple", size=3) + xlab("A38-41    A38-5   ") + ylab("log10(Residual from Mean)+1") + ggtitle("IGF1R inside region") + theme(plot.title = element_text(face="bold"))

 

The problem is that it keeps giving me error saying tha:

 Removed 110 rows containing missing values (geom_point).

 

This could be that these values are negative so taking the log10(value)+1?

 

 

 

 

graphing R boxplot • 5.5k views
ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by bgraphit20
1

Are you trying to make boxplot of some specific gene?

 

ADD REPLYlink written 3.8 years ago by Deepak Tanwar3.9k
1

correct but within the data frame I have information for 2 cell types and those are found:

columns 2:48 data I want to include in one boxplot

columns 49:128 data i want to include in another boxplot

I just edited to clarify

ADD REPLYlink written 3.8 years ago by bgraphit20

Do you need to do the log transformation?  That is what is introducing your NaNs.  The boxplot will plot negative numbers if you want to keep them non-transformed.

If you need to do the log transformation, do it like this instead:

news$data <- (log10(abs(as.numeric(news$data)) + 1))
ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by Steven Lakin1.4k

Within my libraries there are some that have 0 counts so when trying to find the residual to mean from those libraries for that particular gene... there are some that end up being negative values.

These are being excluded from the plot when I do the log transformation.  Yet following your advise and running

news$data <- (log10(abs(as.numeric(news$data)) + 1))

allows for all values to be plotted.

 

Yet  due to some outliers i am using the log

ADD REPLYlink written 3.8 years ago by bgraphit20
5
gravatar for Nicola Casiraghi
3.8 years ago by
Trento, IT
Nicola Casiraghi440 wrote:
gene_id <- 1 # consider the first gene
data_1 <- your_dataframe[gene_id,2:48]
data_2 <- your_dataframe[gene_id,49:128]
boxplot(data_1,data_2)
ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by Nicola Casiraghi440
1

#this was before your previous edit, but i'm going to leave it for more examples for others.

I think he wants a specific gene name though, so to add onto your answer:

data_1 <- unlist(your_dataframe[your_dataframe$gene == "gene",2:48])
data_2 <- unlist(your_dataframe[your_dataframe$gene == "gene",49:128])
​boxplot(data_1,data_2)

You could also do it with subset:

data_1 <- unlist(subset(your_dataframe, gene == "geneName", select=2:48))
data_2 <- unlist(subset(your_dataframe, gene == "geneName", select=49:128))
boxplot(data_1,data_2)

Or with factors and ggplot2 if you're feeling fancy:

library(ggplot2)
data <- unlist(subset(your_dataframe, gene == "geneName", select=2:128))
newFrame <- data.frame(data=data, factor=c(rep(1,47), rep(2,80))
qplot(factor(factor), data, data=newFrame, geom="boxplot")
ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by Steven Lakin1.4k

I am running into the problem where my  data ( residual from mean ... meaning x value - mean) is a series of positive and negative values and it appears that with this plot it is excluding these negative values...

 

data <- unlist(subset(datavr, gene =="IGF1R", select=2:128))

news <- data.frame(data=data, factor=c(rep(1,47), rep(2,80)))
news$data <- (log10(as.numeric(news$data)) + 1)

g <- ggplot(data=news, aes(x=as.factor(factor), y=data))

g + geom_boxplot() + geom_point(color="purple", size=3) + xlab("A38-41    A38-5   ") + ylab("log10(Residual from Mean)+1") + ggtitle("IGF1R inside region") + theme(plot.title = element_text(face="bold"))

 

The problem is that it keeps giving me error saying tha:

 Removed 110 rows containing missing values (geom_point).

 

This could be that these values are negative so taking the log10(value)+1?

 

 

 

 

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by bgraphit20
5
gravatar for Steven Lakin
3.8 years ago by
Steven Lakin1.4k
Fort Collins, CO, USA
Steven Lakin1.4k wrote:

Since you added additional information, I'll just post this as an answer.  Your best bet if you want to manipulate details about graphing in R is to use ggplot2 along with factors:

library(ggplot2)   # or install.packages("ggplot2"); library(ggplot2)

data <- unlist(subset(your_dataframe, gene == "geneName", select=2:128))

newData <- data.frame(data=data, factor=c(rep(1,47), rep(2,80))

g <- ggplot(data=newData, aes(x=as.factor(factor), y=data)

g + geom_boxplot() + geom_point(color="dark red", size=3) + xlab("x axis label") + ylab("y axis label") + ggtitle("My Plot Title") + theme(plot.title = element_text(face="bold"))

You can edit virtually everything you see with ggplot2; I only included the basics here.  A google search for more will help with that.

 

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by Steven Lakin1.4k

to plot the (log10(values)+1)....which data frame must be changed?


 

ADD REPLYlink written 3.8 years ago by bgraphit20
1

Do the following to the newData dataframe before plotting:

newData$data <- log10(as.numeric(newData$data)) + 1
ADD REPLYlink written 3.8 years ago by Steven Lakin1.4k
4
gravatar for ethan.kaufman
3.8 years ago by
ethan.kaufman360
Canada
ethan.kaufman360 wrote:

To make a boxplot with base graphics in R, you need to create a "factor" vector, which indicates which category each of your data points belong to:

f <- factor(c(rep("Group 1", 47), rep("Group 2", 80)))

Then call "boxplot" with the factor and data as arguments:

boxplot(f, as.numeric(dat[1, 2:128]))

Edit: Actually creating the factor is not even necessary in this case. You can just list the two data vectors as multiple arguments:

boxplot(as.numeric(dat[1,2:48]), as.numeric(dat[1,49:128]))
ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by ethan.kaufman360
2
gravatar for Deepak Tanwar
3.8 years ago by
Deepak Tanwar3.9k
ETH Zürich, Switzerland
Deepak Tanwar3.9k wrote:
boxplot(data[which(data$gene == "Gene_name"),][2:48], data[which(data$gene == "Gene_name"),][49:ncol(data)], names = c("group1", "group2"))
ADD COMMENTlink written 3.8 years ago by Deepak Tanwar3.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 858 users visited in the last hour