Question: Conditional analysis with duplicate variant IDs in PLINK
0
gravatar for alesssia
2.9 years ago by
alesssia520
London, UK
alesssia520 wrote:

Hi everybody,

I have just run a GWAS (linear regression with covariates) in PLINK (v1.90b3.38). I got a locus with several significantly associated SNPs and I am now running a conditional model (using the PLINK's --condition-list option) aiming at finding secondary association signals. Specifically, I am conditioning on my top-associated SNP (saved in a text file) and I am including only the variants in the identified locus (by selecting only the SNPs in that locus with the PLINK's --include option).

This is the command I am running the following command:

plink --bfile myplink  --extract mysnsp.txt --pheno mypheno.txt --covar mycovar.txt 
--no-parents --allow-no-sex --missing-phenotype -9 --maf 0.05 --geno 0.05 
--hwe 1e-09 --linear --ci 0.95 --condition-list best.txt --out conditional_model

However, my dataset includes some duplicated variants such as:

19  19:12161188 0   12161188    TT  T
19  19:12161188 0   12161188    T   TTA

and the conditional analysis terminates with the following error:

...
Phenotype data is quantitative.
Error: Duplicate ID '19:12161188'.

Let me stress that a normal association study (that is, not including the SNP with the --condition-list option) works fine.

I am now manually adding the allelic dosage of the top-associated SNP to the set of covariates, but I wonder if there is a more elegant solution (and also why duplicate variants are a problem for conditional but not for association analyses).

Thank you very much!

ADD COMMENTlink modified 2.9 years ago by Vincent Laufer1.0k • written 2.9 years ago by alesssia520
0
gravatar for Vincent Laufer
2.9 years ago by
Vincent Laufer1.0k
United States
Vincent Laufer1.0k wrote:

Depending on what you think about the duplicated positions, I'd either remove them, (based on quality metrics, etc.) recode it as multiallelic, or write a script to give all my RSIDs unique names.

The first two options should be possible using published tools. For the last option, it would look something like:

Let's call it: give_snps_unique_names.py

outfile=open('name_of_output_vcf.vcf', 'a')
    i=0
    position_list=list()
    with open('my_vcf.vcf', 'r') as input_vcf:
            for line in input_vcf:
                    line=line.strip().split()
                    if line[1] in position_list:
                            i+=1
                            line[1]=line[1] + "_" + str(i)
                    else:
                            i=0
                    position_list.append(line[1])
                    print(line)
                    outfile.write("\t".join(line) + "\n")

for the sample data

1 rs4553536 (... and the other vcf columns ...)
2 rs4452526
2 rs5909240
3 rs2480408
4 rs22857424
5 rs47249422
6 chr6:57669955
6 chr6:57669955
6 chr6:57669955
6 chr6:57669955
6 chr6:57669955
7 rs42082904
7 rs82498424
7 rs20948494

python give_snps_unique_names.py

returns:

['1', 'rs4553536', ... (and other cols) ...]
['2', 'rs4452526']
['2', 'rs5909240']
['3', 'rs2480408']
['4', 'rs22857424']
['5', 'rs47249422']
['6', 'chr6:57669955']
['6', 'chr6:57669955_1']
['6', 'chr6:57669955_2']
['6', 'chr6:57669955_3']
['6', 'chr6:57669955_4']
['7', 'rs42082904']
['7', 'rs82498424']
['7', 'rs20948494']

and will write a vcf that should be identical but with no non-unique RSIDs.

May need to include something to deal with the header.

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by Vincent Laufer1.0k

Thanks for sharing this other workaround, but my question aimed at understanding why the problem appears only with conditional analyses (and not with classical association study) and how to fix the problem without modifying the original data (or re-coding the top-associated SNPs and add it to the covariate set).

ADD REPLYlink written 2.9 years ago by alesssia520

because the source code scans input IDs for conditional analysis, but not for single SNP analysis. If you think about how you have to code the conditional analysis, it makes perfect sense why it would do so. https://github.com/chrchang/plink-ng Chris Chang is very responsive, if you have questions about the source code itself I'd try Plink2's user help forums.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Vincent Laufer1.0k
1

Thanks Vincent, will do!

ADD REPLYlink written 2.8 years ago by alesssia520
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1577 users visited in the last hour