Question

Large sample size in edger

0

Entering edit mode

3.2 years ago

Peter ▴ 20

Hello,

I have a spreadsheet that I intend to analyze using the edgeR package. Usually the number of samples is small, so I use a script that looks like this:

mydata <- as.matrix (read.table (mydata, header = TRUE, sep = "\ t", row.names = 1, as.is = TRUE))

libSizes <- as.vector (colSums (mydata))

groups <- c ("CTRL", "CTRL", "CTRL", "WW", "WW", "WW")

d <- DGEList (counts = mydata, group = factor (groups), lib.size = libSizes)

d <- calcNormFactors (d)

d1 <- estimateCommonDisp (d, verbose = T)

d1 <- estimateTagwiseDisp (d1)

fit = glmFit (d1)

result <-glmLRT (fit)

However, now I have a total of 1200 samples, divided into two groups: CTRL (n = 518) and INF (n = 582).

When I apply the step to create the group vector, the program returns the "+" sign, as if it were not able to store so many values.

Can someone help me?

Thank you

RNA-Seq R edgeR • 664 views

ADD COMMENT • link updated 3.2 years ago by Gordon Smyth ★ 7.0k • written 3.2 years ago by Peter ▴ 20

score 3 · Accepted Answer · 2021-02-04

3

Entering edit mode

3.2 years ago

Gordon Smyth ★ 7.0k

For RNA-seq with such large sample numbers, I would use limma instead of edgeR, although the quasi-likelihood pipeline of edgeR can also handle a lot of samples.

I don't understand your comment about "create the group vector". Surely there can't be any problem with that. Where your code would run into problem is with estimateTagwiseDisp(), which is not designed for such large numbers of samples.

ADD COMMENT • link 3.2 years ago by Gordon Smyth ★ 7.0k

0

Entering edit mode

Thank you so much for your answer, Gordon!

I will adopt the limma to do my analysis.

ADD REPLY • link 3.2 years ago by Peter ▴ 20