Duplicate SNP IDs in TOPMed Imputation output for gwas data
1
0
Entering edit mode
3.2 years ago
yy237 ▴ 30

Hi,

I used the TOPMed Imputation server to impute SNPs for my genotype data. After converting the output VCF files to plink binary files, I used --list-duplicate-vars in plink to check duplications of SNP IDs in my .bim file and realized the pairs of "SNP ID duplications" were actually the same SNP (same position) but with the reference and alternative alleles flipped in orders (please see below for the first few rows of my plink.dupvar file).

In such a case, should I still exclude all the duplicate IDs from the dataset moving forward? Does that mean I will be losing information for both SNPs in each pair of SNP duplicates?

(So far I only checked the .bim file for chromosome 1, and 1472 pairs of duplicate IDs were identified)

CHR     POS     ALLELES IDS

1       869598  T,TA    chr1:869598:T:TA chr1:869598:TA:T

1       1177060 C,CTG   chr1:1177060:C:CTG chr1:1177060:CTG:C

1       1293960 T,TCGGGG        chr1:1293960:T:TCGGGG chr1:1293960:TCGGGG:T

1       1693590 A,AC    chr1:1693590:A:AC chr1:1693590:AC:A

1       2299253 A,AG    chr1:2299253:A:AG chr1:2299253:AG:A

1       2423169 T,TTTTG chr1:2423169:T:TTTTG chr1:2423169:TTTTG:T

1       2808716 G,GC    chr1:2808716:G:GC chr1:2808716:GC:G

Thank you!

topmed gwas imputation SNP snp • 1.1k views
ADD COMMENT
0
Entering edit mode
3.2 years ago

These are multi-allelic sites; so, have you tried to split them into separate records via bcftools norm -m-any ?

You may separately or in addition wish to set a custom ID for each variant - please see my bcftools annotate command, here: A: Merging vcf files (intersection and union)

Kevin

ADD COMMENT
1
Entering edit mode

Thank you very much Kevin!

ADD REPLY

Login before adding your answer.

Traffic: 2547 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6