Dear community members,
I have a situation where I have a patient and X controls (humans). I have an RNAseq from the same cell type of both patient/controls. RNAseq is sequenced uniformly in the same sequencing center to a modern standards of coverage.
I know that one gene in a patient is deregulated (up or down) in a quite radical fashion (expression is halved / increased by a factor of 1.5). I do not know the gene, so in the best case I have to check ~4K genes with the disease-causing phenotype, in the worst - all ~20K genes. I do know the patient (so I am testing 1-vs-all).
How many controls do I need to take to have a high power of expression deregulation detection in 1 patient? The gene is assumed to be quite well expressed in the dataset (middle to high).
In other words - how many samples I need to include into the estimation of expression distribution, so, given the natural variability of expression, I'd be able to specify several candidate genes with de-regulated expression which survived FDR correction.
I know that there are tens of additional factors that is required to know in addition to what I've said, so I am asking not for a strict estimation, but for your gut feeling.