Hi everyone,
I recently downloaded the latest dbSNP VCF and when opening the file, I noticed the #CHROM column is filled with RefSeq ID instead of chr1, chr2 or so on. Here is how the VCF looks like:
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
NC_000001.11    10019   rs775809821     TA      T       .       .       RS=775809821;dbSNPBuildID=144;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=INDEL
NC_000001.11    10039   rs978760828     A       C       .       .       RS=978760828;dbSNPBuildID=150;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=SNV
NC_000001.11    10043   rs1008829651    T       A       .       .       RS=1008829651;dbSNPBuildID=150;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=SNV
I would like to convert the RefSeq ID to its corresponding chromosome number:
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
chr1    10019   rs775809821     TA      T       .       .       RS=775809821;dbSNPBuildID=144;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=INDEL
chr1   10039   rs978760828     A       C       .       .       RS=978760828;dbSNPBuildID=150;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=SNV
chr1    10043   rs1008829651    T       A       .       .       RS=1008829651;dbSNPBuildID=150;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=SNV
Is a tool or script available that can convert the RefSeq ID? Thank you in advance.
Hi rrbutleriii, brilliant answer- 4.4 years later and it is still an issue (also the documentation is still unhelpful). Adding here an update for the most recent version, to get chromosome names as chrN rather than N:
Huge thanks to you! This is so essential for anyone who wants to use dbSNP VCF, yet nothing was mentioned in the NCBI documentation!
Hi @rrbutleriii, How do you manage to use multiple threads (in bedtools) to process that single task?