Guys and girls,
When it comes to EdgeR CPM filtering, how does one define a suitable cut-off. I have read that this filter choice is arbitary. The EdgeR vignette says
cpm(y)>1 = n. n being the smallest number of samples in a replicate grouping.
I have an experiment with two time points and I'm running the GLM (drug placebo example in the vignette) with a design file and a GLM fit.
3 conditions: control, light and heavy, all 3 conditions on each of the 2 time points, 3 biological reps per group.
I was planning on using
cpm(y)>10=6 as I expect genes to appear in at least 6 of the 18 samples. I know EdgeR recommends using
cpm(y)>10=3 in my case (3 being smallest rep group). Is there anything wrong with using 6, as each treatment is across 2 time points? MY BCV comes down when I filter, as there seems to be a lot of lowly expressed transcripts in my samples. I am more interested in the tag comparisons which show moderate to large changes in DE.
Thanks for the feedback. I have heard of the genefilter module for R, does anyone know any tutorials on how I can apply it to EdgeR rather than Deseq2.