Conditional analysis with duplicate variant IDs in PLINK
1
0
Entering edit mode
7.8 years ago
alesssia ▴ 580

Hi everybody,

I have just run a GWAS (linear regression with covariates) in PLINK (v1.90b3.38). I got a locus with several significantly associated SNPs and I am now running a conditional model (using the PLINK's --condition-list option) aiming at finding secondary association signals. Specifically, I am conditioning on my top-associated SNP (saved in a text file) and I am including only the variants in the identified locus (by selecting only the SNPs in that locus with the PLINK's --include option).

This is the command I am running the following command:

plink --bfile myplink  --extract mysnsp.txt --pheno mypheno.txt --covar mycovar.txt 
--no-parents --allow-no-sex --missing-phenotype -9 --maf 0.05 --geno 0.05 
--hwe 1e-09 --linear --ci 0.95 --condition-list best.txt --out conditional_model

However, my dataset includes some duplicated variants such as:

19  19:12161188 0   12161188    TT  T
19  19:12161188 0   12161188    T   TTA

and the conditional analysis terminates with the following error:

...
Phenotype data is quantitative.
Error: Duplicate ID '19:12161188'.

Let me stress that a normal association study (that is, not including the SNP with the --condition-list option) works fine.

I am now manually adding the allelic dosage of the top-associated SNP to the set of covariates, but I wonder if there is a more elegant solution (and also why duplicate variants are a problem for conditional but not for association analyses).

Thank you very much!

plink GWAS conditional analysis • 4.0k views
ADD COMMENT
0
Entering edit mode
7.8 years ago
LauferVA 4.2k

Depending on what you think about the duplicated positions, I'd either remove them, (based on quality metrics, etc.) recode it as multiallelic, or write a script to give all my RSIDs unique names.

The first two options should be possible using published tools. For the last option, it would look something like:

Let's call it: give_snps_unique_names.py

outfile=open('name_of_output_vcf.vcf', 'a')
    i=0
    position_list=list()
    with open('my_vcf.vcf', 'r') as input_vcf:
            for line in input_vcf:
                    line=line.strip().split()
                    if line[1] in position_list:
                            i+=1
                            line[1]=line[1] + "_" + str(i)
                    else:
                            i=0
                    position_list.append(line[1])
                    print(line)
                    outfile.write("\t".join(line) + "\n")

for the sample data

1 rs4553536 (... and the other vcf columns ...)
2 rs4452526
2 rs5909240
3 rs2480408
4 rs22857424
5 rs47249422
6 chr6:57669955
6 chr6:57669955
6 chr6:57669955
6 chr6:57669955
6 chr6:57669955
7 rs42082904
7 rs82498424
7 rs20948494

python give_snps_unique_names.py

returns:

['1', 'rs4553536', ... (and other cols) ...]
['2', 'rs4452526']
['2', 'rs5909240']
['3', 'rs2480408']
['4', 'rs22857424']
['5', 'rs47249422']
['6', 'chr6:57669955']
['6', 'chr6:57669955_1']
['6', 'chr6:57669955_2']
['6', 'chr6:57669955_3']
['6', 'chr6:57669955_4']
['7', 'rs42082904']
['7', 'rs82498424']
['7', 'rs20948494']

and will write a vcf that should be identical but with no non-unique RSIDs.

May need to include something to deal with the header.

ADD COMMENT
0
Entering edit mode

Thanks for sharing this other workaround, but my question aimed at understanding why the problem appears only with conditional analyses (and not with classical association study) and how to fix the problem without modifying the original data (or re-coding the top-associated SNPs and add it to the covariate set).

ADD REPLY
0
Entering edit mode

because the source code scans input IDs for conditional analysis, but not for single SNP analysis. If you think about how you have to code the conditional analysis, it makes perfect sense why it would do so. https://github.com/chrchang/plink-ng Chris Chang is very responsive, if you have questions about the source code itself I'd try Plink2's user help forums.

ADD REPLY
1
Entering edit mode

Thanks Vincent, will do!

ADD REPLY

Login before adding your answer.

Traffic: 2835 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6