sample-specific filtering of counts matrix
19 months ago
19 months ago

I know I can filter my counts matrix using this command

filtered.counts <- counts[rowSums(counts==0)<3, ]


when I would like to keep genes with counts in more than three samples.

But is there a way to do the same and removes rows from the matrix when this three 0 are in only one condition? I have 2 conditions with each four replica. I would like to filter for genes with counts in at least two of them.

Would this kind of filtering make sense? Or do I create a bias in the expression matrix?

thanks Assa

counts RNA-Seq deseq2 condition • 540 views
You could simply use something like FilterByExpr from edgeR.

I would keep the rows (genes) if one condition has all zeros while the rest having non-zero values. Depending on the sequencing depth across different samples/conditions, this gene might simply be under-/over-represented in one condition vs others. And yes, sample-specific filtering might result in biases in the downstream steps.