Boxplot or ggplot
2
0
Entering edit mode
4.0 years ago
Kai_Qi ▴ 130

I am comparing gene expression level across 4 developmental stages: E11, E14, E18, Adult. I have about 800 genes selected. Each genes has a read at these satages. Now I made a matrix: the row name are gene ID, while the column name are the 4 stages. each columns contains the read number. I can easily use Boxplot function to get the figure, but I found that if I want to do t-test to see the p value and add them to the boxplot a lot of online solutions are ggplot.

I thought about ggplot, the problem is that if I use ggplot in the beginning I have to put a new matirx with 4X800 rows? it looks pretty intimidating.

Is my understanding right? Any advice is appreciated.

R software error RNA-Seq sequencing • 914 views
ADD COMMENT
2
Entering edit mode
4.0 years ago

I thought about ggplot, the problem is that if I use ggplot in the beginning I have to put a new matirx with 4X800 rows? it looks pretty intimidating.

It is not intimidating once you understand the basics of format conversion. Currently your data is in wide format (800 genes x 4 stages). Data needs to be rearranged in a different format for use in ggplot and the format is called long format (3 columns - Genes, Stages and Counts). The reformat is done by single line code using gather function in tidyr library in R.

Let us say if your data is in following format:

> df
     genes E11  E14  E18 Adult
1   gene_1 526  555  772   818
2   gene_2 286  555 1077  1099
3   gene_3 468  433  937   951
4   gene_4 584  618 1001   970
5   gene_5 292  700  821   942
6   gene_6 334  526  803   901
7   gene_7 214  683  922   914
8   gene_8 558 1102  738   799
9   gene_9 550  494  799   920
10 gene_10 581  993  991   996

You need to run following code to create a data frame for working with ggplot:

library(tidyr)
df2=gather(df,"Stages","Reads",-genes)

Output would be:

> as_tibble(df2)
# A tibble: 40 x 3
   genes   Stages Reads
   <chr>   <chr>  <int>
 1 gene_1  E11      526
 2 gene_2  E11      286
 3 gene_3  E11      468
 4 gene_4  E11      584
 5 gene_5  E11      292
 6 gene_6  E11      334
 7 gene_7  E11      214
 8 gene_8  E11      558
 9 gene_9  E11      550
10 gene_10 E11      581
# … with 30 more rows

Use df2 for ggplot functions.

ADD COMMENT
3
Entering edit mode
4.0 years ago
MatthewP ★ 1.4k

Yes you are right, your new column name should be Read and Stage. You can use function gather to generate new matrix. Below is a simple example.

> library(tidyverse, quietly = TRUE)
> matrix1 <- tibble(a = c(0, 0, 0), b = 1:3, c = 1:3)
> matrix1
# A tibble: 3 x 3
      a     b     c
  <dbl> <int> <int>
1     0     1     1
2     0     2     2
3     0     3     3


 > tidyr::gather(matrix1, key = "group", value = "new_col", a, b, c)
# A tibble: 9 x 2
  group new_col
  <chr>   <dbl>
1 a           0
2 a           0
3 a           0
4 b           1
5 b           2
6 b           3
7 c           1
8 c           2
9 c           3
ADD COMMENT
0
Entering edit mode
mydat2 <- tidyr::gather(mydat, key = "stage", value = "new_col", E11, E14, E18, Adult)

mydat is a matrix, I chekced all the numbers in E11, E14, E18, Adult are numeric. but got an error:

Error in gather_(data, key_col = compat_as_lazy(enquo(key)), value_col = compat_as_lazy(enquo(value)),  : 
  unused arguments (E14, E18, Adult)
ADD REPLY

Login before adding your answer.

Traffic: 2151 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6