Question: How to determine which SNPs are above FDR line?
1
gravatar for joshf
3.2 years ago by
joshf20
Netherlands
joshf20 wrote:

Hey so here I have an R script I use to make Q-Q plots for GWA SNP P-values with FDR lines. How could I determine with programming which P-values (or SNPs, really) are significant enough to have passed the 0.05 FDR line? I'm just really not sure how to approach this. Example below!

  qqunif = function(p, BH=T, MAIN = " ", SUB=" ")
    {
      nn = length(p)
      xx =  -log10((1:nn)/(nn+1))
      plot( xx,  -sort(log10(p)),
            main = MAIN, sub= SUB, cex.sub=1.3,
         xlab=expression(Expected~~-log[10](italic(p))),
            ylab=expression(Observed~~-log[10](italic(p))),
           cex.lab=1.0,mgp=c(2,1,0))
      abline(0,1,col='gray')
      if(BH) ## BH = include Benjamini Hochberg FDR
        {
          abline(-log10(0.05),1, col='purple',lty=1)
          abline(-log10(0.10),1, col='blue',lty=1)
          abline(-log10(0.25),1, col='green',lty=1)
          legend('topleft', c("FDR = 0.05","FDR = 0.10","FDR = 0.25"),
                 col=c('purple','blue','green'),lty=c(1,1,1), cex=0.8)
          if (BF)
          {
            abline(h=-log10(0.05/nn), col='black') ## bonferroni
          }
        }
    }

EXAMPLE: enter image description here I can SEE that there are 25 SNPs that passed the 0.05 FDR line, but how can I get a list of just those 25 SNPs in a new file? Thing is sometimes I work with much larger files. My input files are really just: rs_number pval

Hopefully someone can help me with this

gwas snp R fdr • 1.4k views
ADD COMMENTlink modified 3.2 years ago by Kolea Zimmerman80 • written 3.2 years ago by joshf20
0
gravatar for Kolea Zimmerman
3.2 years ago by
Kolea Zimmerman80 wrote:

The equation for your .05 FDR line is y= -log10(expected p value) + (-log10(.05)), so you just have to determine if your observed values fall above this line. If you create two vectors with your expected and observed p-values

where

Expected_Pvalue = xx, Observed_Pvalue = -sort(log10(p)))

Then you can use the line equation to do a simple logical evaluation to get a subsetting index:

Pvalue_keeps <- Observed_Pvalue > -log10(Expected_Pvalue) + -log10(.05)

Finally use this index to subset your dataframe of snps and pvalues, which was also sorted based on: -sort(log10(p))

snps_pvalues_df[Pvalue_keeps, ]
ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by Kolea Zimmerman80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1131 users visited in the last hour