Question: EdgeR filter thresholds
gravatar for Biogeek
4.8 years ago by
Biogeek400 wrote:

Guys and girls,

When it comes to EdgeR CPM filtering, how does one define a suitable cut-off. I have read that this filter choice is arbitary. The EdgeR vignette says cpm(y)>1 = n. n being the smallest number of samples in a replicate grouping.

I have an experiment with two time points and I'm running the GLM (drug placebo example in the vignette) with a design file and a GLM fit.

3 conditions: control, light and heavy, all 3 conditions on each of the 2 time points, 3 biological reps per group.

I was planning on using cpm(y)>10=6 as I expect genes to appear in at least 6 of the 18 samples. I know EdgeR recommends using cpm(y)>10=3 in my case (3 being smallest rep group). Is there anything wrong with using 6, as each treatment is across 2 time points? MY BCV comes down when I filter, as there seems to be a lot of lowly expressed transcripts in my samples. I am more interested in the tag comparisons which show moderate to large changes in DE.

Thanks for the feedback. I have heard of the genefilter module for R, does anyone know any tutorials on how I can apply it to EdgeR rather than Deseq2.

ADD COMMENTlink modified 4.8 years ago by geek_y11k • written 4.8 years ago by Biogeek400

There is nothing wrong in using n=6 as long as you are OK with removing transcripts expressed below that threshold. But is it cpm(y)>10 or cpm(y)>1 ?

ADD REPLYlink written 4.8 years ago by geek_y11k

cpm(y)>2=9 ok still? It brings the BCV down, but retains more transcripts.

ADD REPLYlink written 4.8 years ago by Biogeek400
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1665 users visited in the last hour