Removal of 'low' counts before DESeq2 - what threshold?
1
0
Entering edit mode
3.2 years ago
dnljmrs ▴ 20

Hi all, I have run DESeq2 successfully in my RNAseq experiment and have observed some differentially expressed genes, so thats fine. There are 2300 genes in the annotated file, but only a handful of these are coming up as significantly DE (with adjusted p value). Many more are flagged as DE with the p value, but have an adjusted value above the sig threhold.

One question I have is regarding low counts. The samples were ran in paired end reads, so I have between 35-51million reads for the samples.. I know that DESeq2 accommodates the low/no read genes, but does it also take this into account for the stats; as in ignore them as though they were filtered out, or are they also considered as an additional sample comparison?

Could I (and indeed, should I) remove the zero counts from this analysis? And if there is scope to remove those with low counts, what would be a reasonable threshold to consider removing? For example, many samples have counts below 100, whereas some are in the tens or hundreds of thousands. At what point are the lower counts considered not necessary to include?

TIA

RNA-Seq threshold counts deseq2 • 3.0k views
ADD COMMENT
2
Entering edit mode
3.2 years ago

There really is no standard, and you can and should feel empowered to choose the best cut-offs as you see fit (or, to make an 'executive' decision as the analyst, as I would say).

I would definitely remove genes with just 0 counts across all samples, unless you are actually expecting this based on your experimental setup.

For all other genes, you can remove low count genes in one or more different ways:

  • remove those with mean average count across all samples < 10
  • remove those with a high frequency of zeros (e.g., 0 count in > 90% of samples)
  • et cetera

It's good to re-run your analysis with different cut-offs to see how it affects the output.

Kevin

ADD COMMENT
1
Entering edit mode

Hi Kevin, Thanks for the reply. I did think it would be an arbitrary kind of threshold (I've done some 16S NGS in the past, and that too was an arbitrary value for cutoff). I'll have a further look into introducing a filter for cutoff. Thanks again

ADD REPLY

Login before adding your answer.

Traffic: 3239 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6