I have chromosome number and SNP position. (about million)
How can I convert these information to SNP ID?
I have chromosome number and SNP position. (about million)
How can I convert these information to SNP ID?
Grab SNPs and convert them to sorted BED. Once they are in BED format, you can convert your positions to BED and do a BEDOPS bedmap operation to map SNP IDs that associate with positions.
For example, here is a way to download dbSNP v150 for hg19 and convert it to BED with BEDOPS vcf2bed:
$ wget -qO- ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b150_GRCh37p13/VCF/All_20170710.vcf.gz | gunzip -c - | vcf2bed --sort-tmpdir=${PWD} --max-mem=2G - > hg19.dbSNP150.bed
You'd modify this for your reference genome, if you're not working with hg19.
Then convert your positions to a sorted BED file, using awk and BEDOPS sort-bed:
$ awk -vOFS="\t" '{ print "chr"$1, ($2 - 1), $2; }' positions.txt | sort-bed - > positions.bed
This assumes that the chromosome number is strictly numerical (i.e., Ensembl format, and not UCSC format). So we add a chr prefix to this number, so that the chromosome names in the BED file positions.bed will match the chromosome names in the BED file hg19.dbSNP150.bed. Modify this depending on the format of chromosome names in your original positions.txt file.
Finally, you can map positions to SNP IDs:
$ bedmap --echo --echo-map-id --delim '\t' positions.bed hg19.dbSNP150.bed > answer.bed
The file answer.bed will have the positions in the first three columns, and the SNP rs-ID in the fourth, last column.
Dear Alex Reynolds,
Thanks for such an efficient method! But I still have some doubts. Using this method only matches the chr:start:end information, which results in multiple rsids being merged to the same variant, and should more accurately be combined with the ref:alt information. Is there a way to take into account the ref:alt information additionally?
In addition to data description, you may want to post example data for better suggestion.
$ bedtools intersect -a test.txt -b dbsnp_mini.vcf -wa -wb
example records:
$ cat test.txt
chrom from to
1 17571 17571
1 17594 17594
output:
1 17571 17571 1 17571 rs557947346 C T . . RS=557947346;RSPOS=17571;dbSNPBuildID=142;SSR=0;SAO=0;VP=0x0500000a0005000000000100;WGT=1;VC=SNV;INT;R5;ASP
1 17594 17594 1 17594 rs377698370 C T . . RS=377698370;RSPOS=17594;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x0500000a0005000002000100;WGT=1;VC=SNV;INT;R5;ASP;OTHERKG
1 17614 17614 1 17614 rs201057270 G A . . RS=201057270;RSPOS=17614;dbSNPBuildID=137;SSR=0;SAO=0;VP=0x050000020005000002000100;WGT=1;VC=SNV;R5;ASP;OTHERKG
Sorry @Emily_Ensembl I was trying to annotate somatic copy number variation in vcf format by VEP but I got this error
Could you please help me with that?
http://grch37.ensembl.org/Multi/Tools/VEP/Ticket?tl=DqLvWXeQg18fDsnn
Thank you
Error:
-------------------- EXCEPTION --------------------
MSG:
ERROR: Forked process(es) died: read-through of cross-process communication detected
STACK Bio::EnsEMBL::VEP::Runner::_forked_buffer_to_output /nfs/public/release/ensweb/latest/live/grch37/www_95/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:554
STACK Bio::EnsEMBL::VEP::Runner::next_output_line /nfs/public/release/ensweb/latest/live/grch37/www_95/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:361
STACK Bio::EnsEMBL::VEP::Runner::run /nfs/public/release/ensweb/latest/live/grch37/www_95/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:202
STACK EnsEMBL::Web::RunnableDB::VEP::run /nfs/public/release/ensweb/latest/live/grch37/www_95/public-plugins/tools_hive/modules/EnsEMBL/Web/RunnableDB/VEP.pm:87
STACK (eval) /nfs/public/release/ensweb/latest/live/grch37/www_95/ensembl-hive//modules/Bio/EnsEMBL/Hive/Process.pm:140
STACK Bio::EnsEMBL::Hive::Process::life_cycle /nfs/public/release/ensweb/latest/live/grch37/www_95/ensembl-hive//modules/Bio/EnsEMBL/Hive/Process.pm:127
STACK (eval) /nfs/public/release/ensweb/latest/live/grch37/www_95/ensembl-hive//modules/Bio/EnsEMBL/Hive/Worker.pm:681
STACK Bio::EnsEMBL::Hive::Worker::run_one_batch /nfs/public/release/ensweb/latest/live/grch37/www_95/ensembl-hive//modules/Bio/EnsEMBL/Hive/Worker.pm:652
STACK Bio::EnsEMBL::Hive::Worker::run /nfs/public/release/ensweb/latest/live/grch37/www_95/ensembl-hive//modules/Bio/EnsEMBL/Hive/Worker.pm:500
STACK main::main /nfs/public/release/ensweb/latest/live/grch37/www_95/ensembl-hive//scripts/runWorker.pl:141
STACK toplevel /nfs/public/release/ensweb/latest/live/grch37/www_95/ensembl-hive//scripts/runWorker.pl:22
Date (localtime) = Thu Mar 28 15:54:32 2019
Ensembl API version = 95
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Did you check these posts?