When it comes to EdgeR CPM filtering, how does one define a suitable cut-off. I have read that this filter choice is arbitary. The EdgeR vignette says cpm(y)>1 = n. n being the smallest number of samples in a replicate grouping.

I have an experiment with two time points and I'm running the GLM (drug placebo example in the vignette) with a design file and a GLM fit.

3 conditions: control, light and heavy, all 3 conditions on each of the 2 time points, 3 biological reps per group.

I was planning on using cpm(y)>10=6 as I expect genes to appear in at least 6 of the 18 samples. I know EdgeR recommends using cpm(y)>10=3 in my case (3 being smallest rep group). Is there anything wrong with using 6, as each treatment is across 2 time points? MY BCV comes down when I filter, as there seems to be a lot of lowly expressed transcripts in my samples. I am more interested in the tag comparisons which show moderate to large changes in DE.

Thanks for the feedback. I have heard of the genefilter module for R, does anyone know any tutorials on how I can apply it to EdgeR rather than Deseq2.

ADD COMMENTlink modified 2.9 years ago by geek_y9.1k • written 2.9 years ago by Biogeek340

There is nothing wrong in using n=6 as long as you are OK with removing transcripts expressed below that threshold. But is it cpm(y)>10 or cpm(y)>1 ?

ADD REPLYlink written 2.9 years ago by geek_y9.1k

cpm(y)>2=9 ok still? It brings the BCV down, but retains more transcripts.

ADD REPLYlink written 2.9 years ago by Biogeek340
