I wonder about the cleaning of rna-seq counts in the context of tumor deconvolution based on rna-seq data.
When performing classic rna-seq differential expression analysis, it is common to remove the genes that are not and almost not expressed across the samples, leading to the removal of ~5-30% of the genes I would say. This filtering step is the only one I am aware of that is commonly performed for DEA. I am interested in the question of tumor deconvolution. In this context, one starts with an expression matrix. I am wondering if this matrix should be preprocessed (more extensively than just removing lowly expressed genes) to remove non informative genes and potential noise.
I recently had an introduction to the analysis of methylation data, what to do once you have the percentage of methylation per CpG. Some people remove CpG that have little variance across samples (mainly unmethylated CpGs, but not only), CpGs on X, Y chromosomes and also try to see if the methylation of certain CpGs correlates with clinical variables (when available) (like age, gender, ...) to filter or adjust them.
I wonder why there are many criteria on methylation data that are not (to my knowledge) use on rnaseq data. Do you know why and do you think they should be use for the question of tumor deconvolution based on rna-seq data?
Thank you in advance for your comments. Jane