I have a large list of rsIDs from dbSNP. I would like to get the ref/alt for each of the snps. What would be the best way of handling this? It's a VCF containing position and dbSNP id, but no ref/alt columns.
How large is large? Give us a number.
Also, I assume single species?
It's roughly 600k IDs. And yes, human.
My first thought would be to download the latest version of dbSNP in VCF format and then use Python, grep, or BCFtools to pull out the information that you need.
dbSNP in VCF can be downloaded from here: https://www.ncbi.nlm.nih.gov/variation/docs/human_variation_vcf/
I reckon that it would take a good few hours to pull out all info.
bcftools annotate -a dbsnp.vcf -c ID -o output.vcf input.vcf.compressed.indixed.vcf.gz should work
bcftools annotate -a dbsnp.vcf -c ID -o output.vcf input.vcf.compressed.indixed.vcf.gz
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy