I found 2,302 significant SNPs in genomic sequences representing a few hundred genes sequenced for 2 populations (12 samples / population). Based on the allele counts, I calculated the allele frequencies at each locus. Then, I calculated de allelic frequency divergence (AFD), i.e :
AFD = freq[a1,pop1] - freq[a1,pop2] (allele-freq for pop1 - allele-freq for pop2)
With that, I want to discover which SNPs show a significant difference in allele frequency between both populations in order to identify putative divergent genes (I also performed an outlier test with Bayescan and I also have some problems with that, but it's not the topic of this question!).
I used the Python package "fisher-0.1.4" (http://pypi.python.org/pypi/fisher/) to compute the p-values for each AFD (i.e for each SNP) and I counter-verified these values with R, building a 2x2 matrix for each SNP and performing this test:
I find that R computes the same p-values than Python. Based on these p-values, among the 2,302 SNPs, 56 SNPs show a AFD > 0.5 and a p-value < 0.05. Now, I want to see if those SNPs are also significant when corrected for multiple hypotheses (FDR). I tried to perform a FDR test to compute q-values based on the distribution of p-values (for the 2,302 SNPs). This is where things get complicated...
I tried two R packages: "qvalue" AND "fdrtool". With both of them, I get an error message, which makes me think that there is something wrong with my dataset of p-values. Here is what I did for the package "qvalue" :
> p <- scan("~/Desktop/pvalues.txt") Read 2302 items >qobj <- qvalue(p, fdr.level=0.05, pi0.method="bootstrap")  "ERROR: The estimated pi0 <= 0. Check that you have valid p-values or use another lambda method."
And even when I try to change some parameters in the test, I ALWAYS get this error message.
Then with the package "fdrtool", this is what I did:
> fdr <- fdrtool(p, statistic="pvalue") Step 1... determine cutoff point Step 2... estimate parameters of null distribution and eta0 Step 3... compute p-values and estimate empirical PDF/CDF Step 4... compute q-values and local fdr Step 5... prepare for plotting Warning message : Censored sample for null model estimation has only size 2 !
With that second package (fdrtool), I can try to visualize the q-values computed with the command:
But ALL the q-values are < 0.05, which is quite strange. How can I have p-values > 0.05 that become significant (i.e < 0.05) when I correct for FDR? And how is it possible that ALL the q-values (n=2,302) are significant? I decided to include in my question a graph showing the distribution of my p-values. Maybe this can help...
UPDATE 02-06-2012 : max p-value = 0.758 and min p-value = 2.73E-10.
If someone could help me, it would be tremendously appreciated!
Cheers ! FO