SNP coordinates to rsID in .bim file
14 months ago
salman_96 ▴ 50

Hi I have a .bim file (Plink) in this format

1   1:727841:G:A    0   727841  A   G
1   1:730087:T:C    0   730087  C   T


I want to convert coordinated to rsID like this

1   rs1048977   0   20945055    T   C
1   rs12128671  0   20945452    G   A


Is there any way to do that in Plink or any other way? (The data is just an example above)

14 months ago

You need to obtain a file with both rsIDs and their coordinates, and then postprocess it to work with your data and e.g. plink's --update-name flag. As of this writing, raw rsID VCFs for reference builds 37 and 38 can be downloaded from https://ftp.ncbi.nlm.nih.gov/snp/latest_release/VCF/ .

When using these tabix files, note that you'll need to translate your chromosome numbering scheme to accession identifiers from the NCBI Reference Sequence Database (RefSeq), e.g. 1 becomes NC_000001.11:

% tabix https://ftp.ncbi.nlm.nih.gov/snp/latest_release/VCF/GCF_000001405.38.gz NC_000001.11:727841-727841
NC_000001.11    727841  rs1339480271    G   A   .   .   RS=1339480271;dbSNPBuildID=151;SSR=0;GENEINFO=LOC100133331:100133331;VC=SNV


You could download the gzip file and its associated tbi index file to run tabix locally, and then use awk or other scripted approaches to read through your list of SNPs, writing out the rsID.