Filter gene with low count in RNA-seq using a function from edgeR
2
0
Entering edit mode
9 months ago
Chris ▴ 260

Hi all, I try to filter out gene with low count from raw count matrix

I run

d <- DGEList(counts=counts,group=factor(conditions))
keep <- filterByExpr(d)
bcv <- 0.2
et <- exactTest(keep, dispersion=bcv^2)

Error in exactTest(d, dispersion = bcv2) : Currently only supports DGEList objects as the object argument.

d <- estimateTagwiseDisp(d)

Error in .compressDispersions(y, dispersion) : dispersions must be finite non-negative values

After filterByExpr(), I got error. If I don't use filterByExpr(), I don't have that error.

Would you please have a suggestion? Unfortunately, I don't have replicate so just try to use edgeR because it supports non-replicate. I know the result will not rigid but still have some degree for reference, is that correct? Thank you so much.

If I don't run this

counts <- counts[which(rowSums(counts)>50),]

but only

counts <- read.delim('counts.csv', header = T,row.names = 1, sep = ',')

I got this:

d <- DGEList(counts=counts,group=factor(conditions))
Error: NA counts not allowed
    sessionInfo()
    R version 4.2.2 (2022-10-31)
    Platform: x86_64-pc-linux-gnu (64-bit)
    Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblas-r0.3.3.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] edgeR_3.40.2 limma_3.54.2

loaded via a namespace (and not attached):
[1] compiler_4.2.2 tools_4.2.2    Rcpp_1.0.11    grid_4.2.2     locfit_1.5-9.8 lattice_0.21-8
edgeR RNA-seq • 1.6k views
ADD COMMENT
1
Entering edit mode

Please mention the package you're using. From Googling around, I can guess you're referring to edgeR::filterByExpr() but you're the only person that knows for sure so edit your post and mention the package.

ADD REPLY
1
Entering edit mode

Is d a matrix? You need to create a DGEList object in order to run those functions.

ADD REPLY
0
Entering edit mode

Yes, I ran DGEList(), but if I use filterByExpr(), I will get the error.

ADD REPLY
1
Entering edit mode

Edit your post and add the package information. Ideally, you should also add sessionInfo() and the package as a tag.

ADD REPLY
0
Entering edit mode

Chris This is starting to feel like pulling teeth. Is that ALL of the output you see from sessionInfo()?

ADD REPLY
0
Entering edit mode

I am sorry. The sessionInfo last time was so long. I removed all unrelated packages and added. Is there anything I can help with your work here?

ADD REPLY
0
Entering edit mode

I removed all unrelated packages and added.

That defeats the purpose of adding sessionInfo(). Please just paste the entire output so people know what exactly you're working with.

ADD REPLY
0
Entering edit mode

I ran multiple R scripts and installed many packages that why it was so long but that all packages I have for this task. Adding packages like DiffBind seem irrelevant, right?

ADD REPLY
1
Entering edit mode

Not at all. While providing a minimal environment that reproduces the error is ideal, quite a few errors are caused by the specific set of packages and environment settings on your machine. sessionInfo() at the time of the error is extremely helpful.

In any case, if your error is resolved, you don't need to edit your post further.

ADD REPLY
0
Entering edit mode

Sorry for the wrong assumption. Yes, the error is resolved. Let me know if I can help with anything.

ADD REPLY
3
Entering edit mode
9 months ago
ATpoint 82k

keep is a logical vector that tells which genes fulfill the filtering criteria. Hence:

d <- d[keep,]

ADD COMMENT
1
Entering edit mode

By the way, the edgeR manual covers this... ;-)

ADD REPLY
0
Entering edit mode

Thanks ATpoint! If I don't run:

counts <- counts[which(rowSums(counts)>50),]

I will get the error:

d <- DGEList(counts=counts,group=factor(conditions))
Error: NA counts not allowed

So what should I do in this case? If I run, then I filter out two times. I have a gene with the two first conditions has around 100 and 250 reads and two other conditions with 0 read, so do this gene being filtered out?

ADD REPLY
1
Entering edit mode
9 months ago
petebio ▴ 100

You are using the d and keep variables incorrectly. Try:

keep<- filterByExpr(d)
d<- d[keep,]
bcv<- 0.2
et<- exactTest(d, dispersion = bcv^0.2)
ADD COMMENT
1
Entering edit mode

keep filters genes, not samples. the comma is placed wrong.

ADD REPLY
0
Entering edit mode

Thank you for your help! I have this when run your suggestion:

d <- d[,keep]

Error in object[[a]][i, j, drop = FALSE] : 
  (subscript) logical subscript too long
ADD REPLY

Login before adding your answer.

Traffic: 1952 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6