How to determine which SNPs are above FDR line?
1
1
Entering edit mode
8.1 years ago
joshf ▴ 40

Hey so here I have an R script I use to make Q-Q plots for GWA SNP P-values with FDR lines. How could I determine with programming which P-values (or SNPs, really) are significant enough to have passed the 0.05 FDR line? I'm just really not sure how to approach this. Example below!

  qqunif = function(p, BH=T, MAIN = " ", SUB=" ")
    {
      nn = length(p)
      xx =  -log10((1:nn)/(nn+1))
      plot( xx,  -sort(log10(p)),
            main = MAIN, sub= SUB, cex.sub=1.3,
         xlab=expression(Expected~~-log[10](italic(p))),
            ylab=expression(Observed~~-log[10](italic(p))),
           cex.lab=1.0,mgp=c(2,1,0))
      abline(0,1,col='gray')
      if(BH) ## BH = include Benjamini Hochberg FDR
        {
          abline(-log10(0.05),1, col='purple',lty=1)
          abline(-log10(0.10),1, col='blue',lty=1)
          abline(-log10(0.25),1, col='green',lty=1)
          legend('topleft', c("FDR = 0.05","FDR = 0.10","FDR = 0.25"),
                 col=c('purple','blue','green'),lty=c(1,1,1), cex=0.8)
          if (BF)
          {
            abline(h=-log10(0.05/nn), col='black') ## bonferroni
          }
        }
    }

EXAMPLE: enter image description here I can SEE that there are 25 SNPs that passed the 0.05 FDR line, but how can I get a list of just those 25 SNPs in a new file? Thing is sometimes I work with much larger files. My input files are really just: rs_number pval

Hopefully someone can help me with this

FDR R GWAS snp • 3.4k views
ADD COMMENT
0
Entering edit mode
8.1 years ago

The equation for your .05 FDR line is y= -log10(expected p value) + (-log10(.05)), so you just have to determine if your observed values fall above this line. If you create two vectors with your expected and observed p-values

where

Expected_Pvalue = xx, Observed_Pvalue = -sort(log10(p)))

Then you can use the line equation to do a simple logical evaluation to get a subsetting index:

Pvalue_keeps <- Observed_Pvalue > -log10(Expected_Pvalue) + -log10(.05)

Finally use this index to subset your dataframe of snps and pvalues, which was also sorted based on: -sort(log10(p))

snps_pvalues_df[Pvalue_keeps, ]
ADD COMMENT

Login before adding your answer.

Traffic: 2551 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6