Guys and girls,
When it comes to EdgeR CPM filtering, how does one define a suitable cut-off. I have read that this filter choice is arbitary. The EdgeR vignette says cpm(y)>1 = n
. n being the smallest number of samples in a replicate grouping.
I have an experiment with two time points and I'm running the GLM (drug placebo example in the vignette) with a design file and a GLM fit.
3 conditions: control, light and heavy, all 3 conditions on each of the 2 time points, 3 biological reps per group.
I was planning on using cpm(y)>10=6
as I expect genes to appear in at least 6 of the 18 samples. I know EdgeR recommends using cpm(y)>10=3
in my case (3 being smallest rep group). Is there anything wrong with using 6, as each treatment is across 2 time points? MY BCV comes down when I filter, as there seems to be a lot of lowly expressed transcripts in my samples. I am more interested in the tag comparisons which show moderate to large changes in DE.
Thanks for the feedback. I have heard of the genefilter module for R, does anyone know any tutorials on how I can apply it to EdgeR rather than Deseq2.
There is nothing wrong in using n=6 as long as you are OK with removing transcripts expressed below that threshold. But is it
cpm(y)>10
orcpm(y)>1
?cpm(y)>2=9 ok still? It brings the BCV down, but retains more transcripts.