I have 2x75b TruSeq RNA-Seq data (paired end, stranded) collected on an Illumina instrument, aligned with STAR, and counted with htseq-count (which agreed with STAR's
--quantMode GeneCounts option). These are rat samples (2 conditions, three biological replicates), which has ~32,754 genes. For one of my samples, here is a binned list of raw (htseq-count) counts by gene:
Counts Genes 0 19,136 1-10 3,699 10-100 3,722 100-1,000 3,784 1,000-10,000 2,089 10,000-100,000 309 100,000-1,000,000 15 1,000,000+ 0 Total: 20,399,575 32,754
As you can see, only ~2,400 genes have counts of 1,000 or more. Do I have enough data to perform differential expression analysis with confidence? What counts cutoff, if any, do you use for DE analysis?
As an example, for one specific gene across six samples (first three control, next three test condition) I have counts of
4, 8, 6, 53, 78, 216 and, after normalization, an adjusted p value of
1.28E-06 indicating differential expression at a log2FC of