I am currently in the process of analyzing RNA-sequencing data and would love to have your input especially with regards to filtering the expression matrix. Currently, I am only removing rows(genes) which have a non-zero expression in at least 70% of my samples,which significantly reduces my gene expression matrix to ~20,000 rows.
I was interested in knowing the following: 1) Is it statistically acceptable to reduce the expression matrix from ~60,000 rows to ~20,000? 2) Is there any other filtering techniques that could be applied to this matrix? (May be variance based)
Note: For downstream analysis, I am using edgeR for normalization and limma for running a differential expression analysis.