Assigngin dbsnp RSIDs to Plink .bim files using chr:pos data
28 days ago
Hamish ▴ 40


I have recently converted a VCF file containing 40 samples into Plink format using the Plink --make-bed flag. The file (name: input_data.bim) I'm left with is in the following format:

10      .       0       45265   A       C
10      .       0       45402   T       C
10      .       0       45781   C       CA
10      .       0       46126   G       A
10      .       0       46915   T       C
10      .       0       47001   CAGAACACAGTAA   C

My aim is to have the . value in the second column converted to a dbsnp rsID by cross-referencing the chromosome and position data columns 1 and 4. I have found this previous post a good starting point and am trying to follow the same logic but must be missing something.

I have my .bim, .bed, .bam Plink files, the downloaded dbsnp153.txt file from UCSC Genome Browser which included all fields by default but I've modified it to the below format (filename: hg38_dbsnp153_final):

#chrom chromStart name
1 10177 rs367896724
1 10352 rs555500075
1 11007 rs575272151
1 11011 rs544419019
1 13109 rs540538026
1 13115 rs62635286

I then run the following:

sudo plink1.9 --bfile input_data --update-name hg38_dbsnp153_final --make-bed --out mydata

Resulting int the following duplicate ID error:

PLINK v1.90b6.16 64-bit (17 Feb 2020)
(C) 2005-2020 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to mydata.log.
Options in effect:
  --bfile input_data
  --out mydata
  --update-name hg38_dbsnp153_final

128894 MB RAM detected; reserving 64447 MB for main workspace.
35624 variants loaded from .bim file.
58 people (0 males, 0 females, 58 ambiguous) loaded from .fam.
Ambiguous sex IDs written to mydata.nosex .
Error: Duplicate ID '.'.

Can anyone suggest a way in which I can resolve this and assign dbsnp rsids to the currently blank second column of my .bim file?

vcf annotation rsid dbsnp Plink • 111 views

