Question

Duplicate SNP IDs in TOPMed Imputation output for gwas data

0

Entering edit mode

3.2 years ago

yy237 ▴ 30

Hi,

I used the TOPMed Imputation server to impute SNPs for my genotype data. After converting the output VCF files to plink binary files, I used --list-duplicate-vars in plink to check duplications of SNP IDs in my .bim file and realized the pairs of "SNP ID duplications" were actually the same SNP (same position) but with the reference and alternative alleles flipped in orders (please see below for the first few rows of my plink.dupvar file).

In such a case, should I still exclude all the duplicate IDs from the dataset moving forward? Does that mean I will be losing information for both SNPs in each pair of SNP duplicates?

(So far I only checked the .bim file for chromosome 1, and 1472 pairs of duplicate IDs were identified)

CHR     POS     ALLELES IDS

1       869598  T,TA    chr1:869598:T:TA chr1:869598:TA:T

1       1177060 C,CTG   chr1:1177060:C:CTG chr1:1177060:CTG:C

1       1293960 T,TCGGGG        chr1:1293960:T:TCGGGG chr1:1293960:TCGGGG:T

1       1693590 A,AC    chr1:1693590:A:AC chr1:1693590:AC:A

1       2299253 A,AG    chr1:2299253:A:AG chr1:2299253:AG:A

1       2423169 T,TTTTG chr1:2423169:T:TTTTG chr1:2423169:TTTTG:T

1       2808716 G,GC    chr1:2808716:G:GC chr1:2808716:GC:G

Thank you!

topmed gwas imputation SNP snp • 1.1k views

ADD COMMENT • link updated 3.2 years ago by Kevin Blighe 87k • written 3.2 years ago by yy237 ▴ 30

score 0 · Answer 1 · 2021-02-08

0

Entering edit mode

3.2 years ago

Kevin Blighe 87k

These are multi-allelic sites; so, have you tried to split them into separate records via bcftools norm -m-any ?

You may separately or in addition wish to set a custom ID for each variant - please see my bcftools annotate command, here: A: Merging vcf files (intersection and union)

Kevin

ADD COMMENT • link 3.2 years ago by Kevin Blighe 87k

1

Entering edit mode

Thank you very much Kevin!

ADD REPLY • link 3.2 years ago by yy237 ▴ 30