Question

How to sum the genewise counts in edgeR ?

1

Entering edit mode

3.7 years ago

sunnykevin97 ▴ 980

HI

How to sum the gene-wise counts in edge R for differential genes comparing 2 groups. Looking for two-group comparison with n=2 in each group. I'm new to R literally struck. suggestion please.

The tabulate matrix looks some thing like this
    temp  Test Sample GeneI Gene2 Gene2 Gene3 ....Genen
    t80 T1  1   55
    t80 T1  2   89
    t80 T1  3   54
    t80 T1  4   453
    t80 T1  5   50
    t80 T1  6   32
    t80 T2  7   45
    t80 T2  8   45
    t80 T2  9   50
    t80 T2  10  54
    t80 T2  11  45
    t80 T2  12  15
    t8  T3  13  43
    t8  T3  14  25
    t8  T3  15  404
    t8  T3  16  385
    t8  T3  17  51
    t8  T3  18  32
    t8  T4  19  454
    t8  T4  20  395
    t8  T4  21  53
    t8  T4  22  35
    t8  T4  23  96
    t8  T4  24  87

     Code
    library(edgeR)
    matrix.dge <- DGEList(matrix)
    head(matrix.dge)

    sumgenes <- sumTechReps(matrix.dge,ID=colnames(matrix.dge))
    head(sumgenes)
    sumcounts <- sumgenes$counts
    nrow(sumcounts)
    head(sumcounts)
    tail(sumcounts)

    The result seems to be same after applying sumTechReps function

RNA-Seq r R • 1.7k views

ADD COMMENT • link updated 3.7 years ago by rpolicastro 13k • written 3.7 years ago by sunnykevin97 ▴ 980

score 4 · Answer 1 · 2020-08-09

For RNA-seq you generally want biological replicates as opposed to technical replicates. That being said, there are a few things that you need to change.

Example data.

df <- structure(list(temp = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), .Label = c("t8", "t80"), class = "factor"), Test = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
3L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("T1", "T2", "T3", "T4"
), class = "factor"), Sample = 1:24, GeneI = c(55L, 89L, 54L,
453L, 50L, 32L, 45L, 45L, 50L, 54L, 45L, 15L, 43L, 25L, 404L,
385L, 51L, 32L, 454L, 395L, 53L, 35L, 96L, 87L), GeneII = c(204,
200, 44, 212, 299, 405, 79, 291, 303, 259, 207, 255, 99, 226,
206, 316, 80, 126, 131, 186, 478, 451, 323, 185)), row.names = c(NA,
-24L), class = "data.frame")

First, the matrix should be genes as rows, and samples as columns.

library("tidyverse")

mat <- df %>%
  select(-temp, -Test) %>%
  mutate(Sample = str_c("sample_", Sample)) %>%
  column_to_rownames("Sample") %>%
  t %>%
  as.matrix

> mat[, 1:5]
       sample_1 sample_2 sample_3 sample_4 sample_5
GeneI        55       89       54      453       50
GeneII      204      200       44      212      299

You then want a group data.frame.

groups <- df %>%
  select(temp, Test, Sample) %>%
  mutate(Sample = str_c("sample_", Sample)) %>%
  column_to_rownames("Sample")

> head(groups, 5)
         temp Test
sample_1  t80   T1
sample_2  t80   T1
sample_3  t80   T1
sample_4  t80   T1
sample_5  t80   T1

If you want to sum the technical replicates, You can then do so after creating the DGEList object using the above matrix and data.frame as inputs. I'm assuming Test denotes the technical replicates?

dge <- DGEList(mat, samples=groups)
dge <- sumTechReps(dge, dge$samples$Test)

> dge
An object of class "DGEList"
$counts
         T1   T2   T3   T4
GeneI   733  254  940 1120
GeneII 1364 1394 1053 1754

$samples
   group lib.size norm.factors temp Test
T1     1     2097            1  t80   T1
T2     1     1648            1  t80   T2
T3     1     1993            1   t8   T3
T4     1     2874            1   t8   T4

Alternatively, instead of summing the technical replicates, you can add it to the regression model to account for it when building the design matrix.

design <- model.matrix(~ temp + Test, groups)