GCTA conditional analysis (COJO) returning NA for most SNPs
0
0
Entering edit mode
4.3 years ago
noahconnally ▴ 30

Hello,

I am using GCTA COJO to perform conditional analysis on summary statistics from a lung cancer GWAS. Specifically, I am trying to condition the P-values of non-coding SNPs on coding SNPs. However, I am getting an unusually high number of NAs in place of my corrected betas, standard errors, and P-values.

Because there are so many coding SNPs, I first select all conditionally significant coding SNPs, then condition all non-coding SNPs on these.

My commands look like this.

gcta64 --bfile [my file for LD calculations] \
--cojo-file [the summary statistics from the GWAS] \
--extract [a list of coding variants] \
--cojo-slct \
--cojo-p 5e-4 \
--out [conditioned coding SNP values]

I then use awk to pull out the conditionally significant coding SNPs.

gcta64 --bfile [same as above] \
--cojo-file [same as above] \
--cojo-cond [the conditionally significant coding SNPs from awk] \
--out [my final summary statistics]

Taking chr8 as an example, many of my results look like this:

Chr  SNP             bp      refA  freq        b             se          p            n       freq_geno   bC            bC_se        pC
8    8:156716:G:C    156716  C     0.00241033  0.00206278   0.00061818   0.000847384  300437  0.00396991  0.00206278    0.00061819   0.000847478
8    8:156747:C:G    156747  G     0.00221684  0.00223315   0.000640693  0.000491219  304047  0.00396991  0.00223315    0.000640705  0.000491294
8    8:157714:C:G    157714  G     0.00315905  -0.000603486 0.000521356  0.247056     322534  0.00480568  -0.000603486  0.000521356  0.247056

But many more look like this:

Chr  SNP           bp      refA  freq       b             se           p         n       freq_geno  bC  bC_se  pC
8    8:156244:T:C  156244  C     0.036727   1.05558e-05   0.000164043  0.948693  289987  0.0385499  NA  NA     NA
8    8:156288:G:C  156288  C     0.253299   -8.04971e-07  6.76117e-05  0.990501  319304  0.25491    NA  NA     NA
8    8:156294:C:A  156294  A     0.0519816  -9.96523e-05  0.000136206  0.464395  301973  0.052967   NA  NA     NA

Overall, for this chromosome, only ~140,000 of the ~655,000 SNPs I input got actual conditioned results, as opposed to NAs.

I have run this analysis on different GWAS results (in the same format) and not had this problem. Does anyone know what might be going on?

Thank you very much!

Edit 1 I know that, as discussed here, if a SNP is completely predictable from a linear combination of SNPs fixed in the model, its P-value will be NA. However, I do not believe this is the explanation for two reasons.

  1. Because I am working with summary statistics for the GWAS, the colinearity is determined from the file of genotypes I use to calculate LD. I have used the same genotypes for conditioning variants in other GWAS without getting nearly as many NAs in my results.

  2. True, which coding variants I am conditioning on varies for different GWAS. However, for my example of chr8, I am only conditioning on one coding SNP, so this explanation would require 500,000 variants spread across the chromosome to be in perfect LD with the one SNP I am conditioning on.

Edit 2

I have noticed something strange, but am not sure what to make of it. Of variants on chr8 with a MAF below 0.0104, 75% have a corrected P-value. Of variants with a MAF above this cutoff, every single one returns an NA for the P-value.

gcta conditional analysis • 3.0k views
ADD COMMENT
0
Entering edit mode

Is 8:156244:T:C in your list supplied to --cojo-cond ?

ADD REPLY

Login before adding your answer.

Traffic: 2036 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6