Question: GCTA conditional analysis (COJO) returning NA for most SNPs
0
gravatar for noahconnally
5 months ago by
noahconnally20
United States
noahconnally20 wrote:

Hello,

I am using GCTA COJO to perform conditional analysis on summary statistics from a lung cancer GWAS. Specifically, I am trying to condition the P-values of non-coding SNPs on coding SNPs. However, I am getting an unusually high number of NAs in place of my corrected betas, standard errors, and P-values.

Because there are so many coding SNPs, I first select all conditionally significant coding SNPs, then condition all non-coding SNPs on these.

My commands look like this.

gcta64 --bfile [my file for LD calculations] \
--cojo-file [the summary statistics from the GWAS] \
--extract [a list of coding variants] \
--cojo-slct \
--cojo-p 5e-4 \
--out [conditioned coding SNP values]

I then use awk to pull out the conditionally significant coding SNPs.

gcta64 --bfile [same as above] \
--cojo-file [same as above] \
--cojo-cond [the conditionally significant coding SNPs from awk] \
--out [my final summary statistics]

Taking chr8 as an example, many of my results look like this:

Chr  SNP             bp      refA  freq        b             se          p            n       freq_geno   bC            bC_se        pC
8    8:156716:G:C    156716  C     0.00241033  0.00206278   0.00061818   0.000847384  300437  0.00396991  0.00206278    0.00061819   0.000847478
8    8:156747:C:G    156747  G     0.00221684  0.00223315   0.000640693  0.000491219  304047  0.00396991  0.00223315    0.000640705  0.000491294
8    8:157714:C:G    157714  G     0.00315905  -0.000603486 0.000521356  0.247056     322534  0.00480568  -0.000603486  0.000521356  0.247056

But many more look like this:

Chr  SNP           bp      refA  freq       b             se           p         n       freq_geno  bC  bC_se  pC
8    8:156244:T:C  156244  C     0.036727   1.05558e-05   0.000164043  0.948693  289987  0.0385499  NA  NA     NA
8    8:156288:G:C  156288  C     0.253299   -8.04971e-07  6.76117e-05  0.990501  319304  0.25491    NA  NA     NA
8    8:156294:C:A  156294  A     0.0519816  -9.96523e-05  0.000136206  0.464395  301973  0.052967   NA  NA     NA

Overall, for this chromosome, only ~140,000 of the ~655,000 SNPs I input got actual conditioned results, as opposed to NAs.

I have run this analysis on different GWAS results (in the same format) and not had this problem. Does anyone know what might be going on?

Thank you very much!

Edit 1 I know that, as discussed here, if a SNP is completely predictable from a linear combination of SNPs fixed in the model, its P-value will be NA. However, I do not believe this is the explanation for two reasons.

  1. Because I am working with summary statistics for the GWAS, the colinearity is determined from the file of genotypes I use to calculate LD. I have used the same genotypes for conditioning variants in other GWAS without getting nearly as many NAs in my results.

  2. True, which coding variants I am conditioning on varies for different GWAS. However, for my example of chr8, I am only conditioning on one coding SNP, so this explanation would require 500,000 variants spread across the chromosome to be in perfect LD with the one SNP I am conditioning on.

Edit 2

I have noticed something strange, but am not sure what to make of it. Of variants on chr8 with a MAF below 0.0104, 75% have a corrected P-value. Of variants with a MAF above this cutoff, every single one returns an NA for the P-value.

gcta conditional analysis • 385 views
ADD COMMENTlink modified 5 months ago • written 5 months ago by noahconnally20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 877 users visited in the last hour