Question: How to determine which SNPs are above FDR line?
gravatar for joshf
4.4 years ago by
joshf20 wrote:

Hey so here I have an R script I use to make Q-Q plots for GWA SNP P-values with FDR lines. How could I determine with programming which P-values (or SNPs, really) are significant enough to have passed the 0.05 FDR line? I'm just really not sure how to approach this. Example below!

  qqunif = function(p, BH=T, MAIN = " ", SUB=" ")
      nn = length(p)
      xx =  -log10((1:nn)/(nn+1))
      plot( xx,  -sort(log10(p)),
            main = MAIN, sub= SUB, cex.sub=1.3,
      if(BH) ## BH = include Benjamini Hochberg FDR
          abline(-log10(0.05),1, col='purple',lty=1)
          abline(-log10(0.10),1, col='blue',lty=1)
          abline(-log10(0.25),1, col='green',lty=1)
          legend('topleft', c("FDR = 0.05","FDR = 0.10","FDR = 0.25"),
                 col=c('purple','blue','green'),lty=c(1,1,1), cex=0.8)
          if (BF)
            abline(h=-log10(0.05/nn), col='black') ## bonferroni

EXAMPLE: enter image description here I can SEE that there are 25 SNPs that passed the 0.05 FDR line, but how can I get a list of just those 25 SNPs in a new file? Thing is sometimes I work with much larger files. My input files are really just: rs_number pval

Hopefully someone can help me with this

gwas snp R fdr • 2.0k views
ADD COMMENTlink modified 4.4 years ago by Kolea Zimmerman100 • written 4.4 years ago by joshf20
gravatar for Kolea Zimmerman
4.4 years ago by
Kolea Zimmerman100 wrote:

The equation for your .05 FDR line is y= -log10(expected p value) + (-log10(.05)), so you just have to determine if your observed values fall above this line. If you create two vectors with your expected and observed p-values


Expected_Pvalue = xx, Observed_Pvalue = -sort(log10(p)))

Then you can use the line equation to do a simple logical evaluation to get a subsetting index:

Pvalue_keeps <- Observed_Pvalue > -log10(Expected_Pvalue) + -log10(.05)

Finally use this index to subset your dataframe of snps and pvalues, which was also sorted based on: -sort(log10(p))

snps_pvalues_df[Pvalue_keeps, ]
ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by Kolea Zimmerman100
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1013 users visited in the last hour