Question: EdgeR filter thresholds
gravatar for Biogeek
2.9 years ago by
Biogeek340 wrote:

Guys and girls,

When it comes to EdgeR CPM filtering, how does one define a suitable cut-off. I have read that this filter choice is arbitary. The EdgeR vignette says cpm(y)>1 = n. n being the smallest number of samples in a replicate grouping.

I have an experiment with two time points and I'm running the GLM (drug placebo example in the vignette) with a design file and a GLM fit.

3 conditions: control, light and heavy, all 3 conditions on each of the 2 time points, 3 biological reps per group.

I was planning on using cpm(y)>10=6 as I expect genes to appear in at least 6 of the 18 samples. I know EdgeR recommends using cpm(y)>10=3 in my case (3 being smallest rep group). Is there anything wrong with using 6, as each treatment is across 2 time points? MY BCV comes down when I filter, as there seems to be a lot of lowly expressed transcripts in my samples. I am more interested in the tag comparisons which show moderate to large changes in DE.

Thanks for the feedback. I have heard of the genefilter module for R, does anyone know any tutorials on how I can apply it to EdgeR rather than Deseq2.

ADD COMMENTlink modified 2.9 years ago by geek_y9.1k • written 2.9 years ago by Biogeek340

There is nothing wrong in using n=6 as long as you are OK with removing transcripts expressed below that threshold. But is it cpm(y)>10 or cpm(y)>1 ?

ADD REPLYlink written 2.9 years ago by geek_y9.1k

cpm(y)>2=9 ok still? It brings the BCV down, but retains more transcripts.

ADD REPLYlink written 2.9 years ago by Biogeek340
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1313 users visited in the last hour