Question

EdgeR filter thresholds

0

Entering edit mode

8.0 years ago

Biogeek ▴ 470

Guys and girls,

When it comes to EdgeR CPM filtering, how does one define a suitable cut-off. I have read that this filter choice is arbitary. The EdgeR vignette says cpm(y)>1 = n. n being the smallest number of samples in a replicate grouping.

I have an experiment with two time points and I'm running the GLM (drug placebo example in the vignette) with a design file and a GLM fit.

3 conditions: control, light and heavy, all 3 conditions on each of the 2 time points, 3 biological reps per group.

I was planning on using cpm(y)>10=6 as I expect genes to appear in at least 6 of the 18 samples. I know EdgeR recommends using cpm(y)>10=3 in my case (3 being smallest rep group). Is there anything wrong with using 6, as each treatment is across 2 time points? MY BCV comes down when I filter, as there seems to be a lot of lowly expressed transcripts in my samples. I am more interested in the tag comparisons which show moderate to large changes in DE.

Thanks for the feedback. I have heard of the genefilter module for R, does anyone know any tutorials on how I can apply it to EdgeR rather than Deseq2.

edger DIFFERENTIAL EXPRESSION • 4.0k views

ADD COMMENT • link updated 8.0 years ago by GouthamAtla 12k • written 8.0 years ago by Biogeek ▴ 470

0

Entering edit mode

There is nothing wrong in using n=6 as long as you are OK with removing transcripts expressed below that threshold. But is it cpm(y)>10 or cpm(y)>1 ?