So I have a question. I have done differential accessibility using edgeR and I am confused if anyone ever sees the same problem regarding the pval vs FDR and if the discrepancy has to due with ATAC-seq data being noisy or me doing something wrong?
An example would be in one contrast I normally get the following results where:
padj = genes that pass p.adj cutoff < 0.01
pval = genes that pass p.val cutoff < 0.01
up = pval and positive fold change
dn = pval and negative fold change
padj = 3
pval = 1586
up = 590
dn = 996
topTags(object = WT_Whole_0hpaVSDMSO_6hpa, adjust.method = "BH", n = 20)
to give some experimental parameters, I have 10 conditions of a experimental time course (5 timepoints), 5 treated and 5 untreated. I also have 3 replicates per condition (30 total samples). The pipeline I have consists of calling peaks by macs and then merging them and getting a consensus peak file (~100,000 peaks). Then I make a counts table using the consensus peak regions (rows-peaks and column-samples). Then I feed this into edgeR. I filter by a cpm of 2 in at least half the samples (~67,000 peaks). I have around 16 contrasts that compare a bunch of the timepoints and treatments. The output genes that pass the p-value cutoff follow the ontology and genes we would expect to see but I'm worried because fdr cutoff is what is normally seen in papers.
Does anyone have any insight into this?
As far as I understand, you ask for the difference between pval and FDR right? FDR represents an adjusted pval to account for the multiple testing problem. Have a look at StatQuest for details. If I got you wrong, please elaborate.