Renaming SNPs or SNP matching
8.3 years ago
Ryan D ★ 3.4k

This should be easy to do by now, but... we have SNP data from an Illumina exome array given to us in PLINK format. The BIM file looks like this:

1       exm2253575      0       881627  G       A
1       exm269  0       881918  A       G
1       exm340  0       888659  T       C
1       exm348  0       889238  A       G
1       exm2264981      0       894573  G       A
1       exm773  0       909238  G       C
1       exm782  0       909309  C       T
1       exm912  0       949608  A       G
1       exm991  0       977028  T       G
1       exm1024 0       978762  A       G


And I have all of the SNPs in dbSNP 138 downloaded as a large VCF file:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
1       10019   rs376643643     TA      T       .       .       RS=376643643;RSPOS=10020;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x050000020001000002000200;WGT=1;VC=DIV;R5;OTHERKG
1       10054   rs373328635     CAA     C,CA    .       .       RS=373328635;RSPOS=10055;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x050000020001000002000210;WGT=1;VC=DIV;R5;OTHERKG;NOC
1       10109   rs376007522     A       T       .       .       RS=376007522;RSPOS=10109;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x050000020001000002000100;WGT=1;VC=SNV;R5;OTHERKG
1       10139   rs368469931     A       T       .       .       RS=368469931;RSPOS=10139;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x050000020001000002000100;WGT=1;VC=SNV;R5;OTHERKG
1       10144   rs144773400     TA      T       .       .       RS=144773400;RSPOS=10145;dbSNPBuildID=134;SSR=0;SAO=0;VP=0x050000020001000002000200;WGT=1;VC=DIV;R5;OTHERKG
1       10146   rs375931351     AC      A       .       .       RS=375931351;RSPOS=10147;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x050000020001000002000200;WGT=1;VC=DIV;R5;OTHERKG


I want to match them up so that each SNP in the BIM is identified from the VCF file. This is mostly for renaming them with proper dbSNP names. I have been trying to match them by formatting them as BED files and using BEDTOOLS while restricting to SNPs that are SNVs. The problem is that there are some SNPs with the same chr/start positions. Is there an easy way to rename or identify the SNPs by including allele information with VCFTOOLS, BEDTOOLS, PLINK, or another common tool? I get matching for about 99% using BEDTOOLS and command-line options, but there must be an easiest or standard way to get this right.

Thanks,
Ryan

bedtools vcf exome-chip bim SNP
Did you figure this out?

6.3 years ago
vakul.mohanty ▴ 260

It would be easier to use a annotator like ANNOVAR.