Question

Subsetting DGEList object for for a list of genes of interest

0

Entering edit mode

20 months ago

Bertalan_Takacs ▴ 100

I have a DGEList object and a list of genes of interest, how do I subset the object so that only the counts for those genes are left? I would like to keep all samples and all groups. I've tried some solutions, such as this https://stackoverflow.com/questions/60230666/how-to-filter-samples-in-a-dgelist-in-edger and this https://rdrr.io/bioc/edgeR/man/subsetting.html but all I get is an "incorrect number of dimensions" error.

This is how my object looks like

An object of class "DGEList" $counts CC.A1 CC.A2 CC.A3 CC.I1 CC.I2 CC.I3 CC.M1 CC.M2 CC.M3 CC.PM1 CC.PM2 CC.PM3 CC.T1 CC.T2 CC.T3 ENSG00000225630 136 221
171 15 73 147 109 140 109 49 20 74 100
143 45 ENSG00000237973 302 471 263 6 170 508 320
356 569 194 100 250 183 243 68 ENSG00000248527 1460 1934 1425 51 953 2064 1769 2124 2524 685 406 1467
742 1403 442 ENSG00000188976 118 121 69 21 43 68
31 28 46 26 25 48 25 22 20 ENSG00000160075 17 54 47 29 58 122 52 33 83 67 27
59 30 47 25 3012 more rows ...

$samples group lib.size norm.factors CC.A1 Anaphase 824461 1.0239396 CC.A2 Anaphase 685407 0.9244884 CC.A3 Anaphase 604611 0.9772303 CC.I1 Interphase 234169 1.2103157 CC.I2 Interphase 534538 0.9980076 10 more rows ...

$genes $Symbol ENSG00000225630 ENSG00000237973 ENSG00000248527 ENSG00000188976 ENSG00000160075 "MTND2P28" "MTCO1P12" "MTATP6P1" "NOC2L" "SSU72" 3012 more elements ...

The list of genes:

[1] "ENSG00000087586" "ENSG00000280109" "ENSG00000276058" "ENSG00000100162" "ENSG00000231007" "ENSG00000230860" "ENSG00000132780"  [8] "ENSG00000142252" "ENSG00000117724" "ENSG00000100218" "ENSG00000118046" "ENSG00000115524" "ENSG00000013810" "ENSG00000175063" [15] "ENSG00000132646" "ENSG00000145386" "ENSG00000037474" "ENSG00000112984" "ENSG00000183255" "ENSG00000237649" "ENSG00000157456" [22] "ENSG00000011426" "ENSG00000198157" "ENSG00000177156" "ENSG00000138160" "ENSG00000120647" "ENSG00000175785" "ENSG00000237195" [29] "ENSG00000140350" "ENSG00000166803" "ENSG00000167978" "ENSG00000166851" "ENSG00000088325" "ENSG00000100297"

Thanks!

dgelist transcriptomics subset edgeR • 1.2k views

ADD COMMENT • link updated 20 months ago by GenoMax 144k • written 20 months ago by Bertalan_Takacs ▴ 100

score 3 · Accepted Answer · 2022-11-11

3

Entering edit mode

20 months ago

jared.andrews07 ★ 17k

# Example data.
ngenes <- 1000
nsamples <- 4
Counts <- matrix(rnbinom(ngenes*nsamples,mu=5,size=2),ngenes,nsamples)
rownames(Counts) <- paste0("Gene",1:ngenes)
y <- DGEList(counts=Counts, group=rep(1:2,each=2))
dim(y)
colnames(y)
y$samples
y$genes <- data.frame(Symbol=paste0("Gene",1:ngenes))

# Subsetting
gs <- c("Gene1", "Gene2", "Gene3")
y[gs,]

ADD COMMENT • link 20 months ago by jared.andrews07 ★ 17k

0

Entering edit mode

Indeed, the beauty of these formats is that the developers made sure that all standard R rules apply, so subsetting works as for any R data.frame/matrix-like object.

[idx,] to select rows

[,idx] to select columns

where idx can be either the names of the rows or columns, or a numeric index, or a logical vector of same length of the respective dimension.

ADD REPLY • link 20 months ago by ATpoint 84k