Question: Make 2 boxplot from a data frame by plotting values in 1 row with different columns per box plot
1
bgraphit20 wrote:

Hi everyone!

I am trying to find the best way to make 2 boxplot for a specific gene from data found in a row for a subset of columns within data frame x.

x dimensions are  634 by 128 columns

Each row is specific to a gene,

Column 1 has gene name, and I want to say look at gene in row#1

columns 2:48 data I want to include in one boxplot

columns 49:128 data i want to include in another boxplot

data fram looks something like this

``      gene       accepted_hits_x1.bam      accepted_hits_x1.bam    etc....``
``````
1      AARS1          -6                            0             etc....``````

`I also want to be able to see each data point that makes up the boxplot plotted in the plot`

I am having a problem:

I am running into the problem where my  data ( residual from mean ... meaning x value - mean) is a series of positive and negative values and it appears that with this plot it is excluding these negative values...

data <- unlist(subset(datavr, gene =="IGF1R", select=2:128))

news <- data.frame(data=data, factor=c(rep(1,47), rep(2,80)))
news\$data <- (log10(as.numeric(news\$data)) + 1)

g <- ggplot(data=news, aes(x=as.factor(factor), y=data))

g + geom_boxplot() + geom_point(color="purple", size=3) + xlab("A38-41    A38-5   ") + ylab("log10(Residual from Mean)+1") + ggtitle("IGF1R inside region") + theme(plot.title = element_text(face="bold"))

The problem is that it keeps giving me error saying tha:

Removed 110 rows containing missing values (geom_point).

This could be that these values are negative so taking the log10(value)+1?

graphing R boxplot • 5.9k views
modified 4.3 years ago • written 4.3 years ago by bgraphit20
1

Are you trying to make boxplot of some specific gene?

1

correct but within the data frame I have information for 2 cell types and those are found:

columns 2:48 data I want to include in one boxplot

columns 49:128 data i want to include in another boxplot

I just edited to clarify

Do you need to do the log transformation?  That is what is introducing your NaNs.  The boxplot will plot negative numbers if you want to keep them non-transformed.

If you need to do the log transformation, do it like this instead:

`news\$data <- (log10(abs(as.numeric(news\$data)) + 1))`
ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by Steven Lakin1.4k

Within my libraries there are some that have 0 counts so when trying to find the residual to mean from those libraries for that particular gene... there are some that end up being negative values.

These are being excluded from the plot when I do the log transformation.  Yet following your advise and running

`news\$data <- (log10(abs(as.numeric(news\$data)) + 1))`

allows for all values to be plotted.

Yet  due to some outliers i am using the log

5
Nicola Casiraghi450 wrote:
```gene_id <- 1 # consider the first gene
data_1 <- your_dataframe[gene_id,2:48]
data_2 <- your_dataframe[gene_id,49:128]
boxplot(data_1,data_2)```
ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by Nicola Casiraghi450
1

#this was before your previous edit, but i'm going to leave it for more examples for others.

I think he wants a specific gene name though, so to add onto your answer:

```data_1 <- unlist(your_dataframe[your_dataframe\$gene == "gene",2:48])
data_2 <- unlist(your_dataframe[your_dataframe\$gene == "gene",49:128])
​boxplot(data_1,data_2)```

You could also do it with subset:

```data_1 <- unlist(subset(your_dataframe, gene == "geneName", select=2:48))
data_2 <- unlist(subset(your_dataframe, gene == "geneName", select=49:128))
boxplot(data_1,data_2)```

Or with factors and ggplot2 if you're feeling fancy:

```library(ggplot2)
data <- unlist(subset(your_dataframe, gene == "geneName", select=2:128))
newFrame <- data.frame(data=data, factor=c(rep(1,47), rep(2,80))
qplot(factor(factor), data, data=newFrame, geom="boxplot")```
ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by Steven Lakin1.4k

I am running into the problem where my  data ( residual from mean ... meaning x value - mean) is a series of positive and negative values and it appears that with this plot it is excluding these negative values...

data <- unlist(subset(datavr, gene =="IGF1R", select=2:128))

news <- data.frame(data=data, factor=c(rep(1,47), rep(2,80)))
news\$data <- (log10(as.numeric(news\$data)) + 1)

g <- ggplot(data=news, aes(x=as.factor(factor), y=data))

g + geom_boxplot() + geom_point(color="purple", size=3) + xlab("A38-41    A38-5   ") + ylab("log10(Residual from Mean)+1") + ggtitle("IGF1R inside region") + theme(plot.title = element_text(face="bold"))

The problem is that it keeps giving me error saying tha:

Removed 110 rows containing missing values (geom_point).

This could be that these values are negative so taking the log10(value)+1?

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by bgraphit20
5
Steven Lakin1.4k wrote:

Since you added additional information, I'll just post this as an answer.  Your best bet if you want to manipulate details about graphing in R is to use ggplot2 along with factors:

```library(ggplot2)   # or install.packages("ggplot2"); library(ggplot2)

data <- unlist(subset(your_dataframe, gene == "geneName", select=2:128))

newData <- data.frame(data=data, factor=c(rep(1,47), rep(2,80))

g <- ggplot(data=newData, aes(x=as.factor(factor), y=data)

g + geom_boxplot() + geom_point(color="dark red", size=3) + xlab("x axis label") + ylab("y axis label") + ggtitle("My Plot Title") + theme(plot.title = element_text(face="bold"))
```

You can edit virtually everything you see with ggplot2; I only included the basics here.  A google search for more will help with that.

ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by Steven Lakin1.4k

to plot the (log10(values)+1)....which data frame must be changed?

1

Do the following to the newData dataframe before plotting:

`newData\$data <- log10(as.numeric(newData\$data)) + 1`
4
ethan.kaufman360 wrote:

To make a boxplot with base graphics in R, you need to create a "factor" vector, which indicates which category each of your data points belong to:

`f <- factor(c(rep("Group 1", 47), rep("Group 2", 80)))`

Then call "boxplot" with the factor and data as arguments:

`boxplot(f, as.numeric(dat[1, 2:128]))`

Edit: Actually creating the factor is not even necessary in this case. You can just list the two data vectors as multiple arguments:

`boxplot(as.numeric(dat[1,2:48]), as.numeric(dat[1,49:128]))`
`boxplot(data[which(data\$gene == "Gene_name"),][2:48], data[which(data\$gene == "Gene_name"),][49:ncol(data)], names = c("group1", "group2"))`