R package for filtering expression data
1
0
Entering edit mode
2.7 years ago

I want to filter out genes with low expression values.

Previously, I used the expFilter() function from the EMA package. However, EMA seems to be deprecated.

What alternative R library can I use?

gene-expression r EMA • 1.5k views
ADD COMMENT
0
Entering edit mode

I like filterByExpr in edgeR (for a matrix of counts).

ADD REPLY
0
Entering edit mode

I tried filterByExpr but my code removed 48464 genes out of 54675 genes, which seems like an overkill. Is there a bug in my code?

library(edgeR)

samp.matrix <- data.matrix(dat[, (3:ncol(dat))])
samp.matrix.filtered <- subset(samp.matrix, log2(rowMeans(samp.matrix)) > 0)

# Use filterByExpr() to filter genes with low expression values 
keep <- filterByExpr(log2(samp.matrix.filtered))
dat <- dat[keep,]
num.lowexp <- nrow(samp.matrix.filtered) - nrow(dat)
cat(sprintf("%s gene(s) identified and removed for low expression\n", num.lowexp))

cat(sprintf("%s gene(s) identified and removed for low expression\n", num.lowexp)) 48464 gene(s) identified and removed for low expression

ADD REPLY
0
Entering edit mode

It would be best to see your input data. Aside from that,

  1. if your data matrix are counts, filterByExpr is expecting a matrix of counts, not log2(counts)
  2. whenever handling counts in general, you should do log2(something + 1), or else you will get infinity values which may mess up some other analyses
  3. I see that in your code you are doing two filtering steps, it may suffice with only using filterByExpr because it already filters for min.total.count, which is similar to filtering by rowSums
ADD REPLY
1
Entering edit mode
2.7 years ago

Just adding an answer for future.

I would learn how to do these filtering steps using base R functions. For example, we can generate vectors of Boolean values that can be used for filtering in these ways:

rows (genes) with mean greater than 10

filt <- apply(expr, 1, mean) > 10
expr.filt <- expr[filt,]

rows (genes) with sum greater than 100

filt <- apply(expr, 1, sum) > 100
expr.filt <- expr[filt,]

columns (samples) with standard deviation (SD) > 3

filt <- apply(expr, 2, sd) > 3
expr.filt <- expr[,filt]

Other mathematical functions include min(), max(), var(), etc

If there are NA values in your data, try something like:

filt <- apply(expr, 1, function(x) mean(x, na.rm = TRUE)) > 10

---------

Other convenience functions include colSums(), rowSums(), colMeans(), rowMeans()

Please also be aware of matrixStats package, which is somewhat needed as datasets become larger, like scRNA-seq datasets: https://cran.rstudio.com/web/packages/matrixStats/index.html

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 2565 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6