Question: Query A Dbsnp Vcf File
1
gravatar for win
6.0 years ago by
win810
India
win810 wrote:

I downloaded the dbSNP VCF file and I have a seperate list of rs#. What i would like to do is get the chromosome and coordinate for each of my SNPs from the dbSNP VCF.

Ideally it would be great if i could create another VCF file with only my SNPs in it and another files which had my SNP and the chromosome number and coordinate with it.

Any help much appreciated.

vcf dbsnp • 3.6k views
ADD COMMENTlink modified 5.6 years ago by pd3340 • written 6.0 years ago by win810
2
gravatar for pd3
5.6 years ago by
pd3340
pd3340 wrote:

Try `bcftools view -i'%ID=@path/to/ids.txt' file.vcf`

http://samtools.github.io/bcftools/bcftools.html

ADD COMMENTlink written 5.6 years ago by pd3340
1
gravatar for Pierre Lindenbaum
6.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:
  • sort your rs list
  • extract the VCF header
  • sort the VCF lines on rs
  • join the files
  • restore the original column order after join

if your list of SNP is not too large, you can insert a fgrep -f rs.txt before the sort to speed up things. But at this point, there is no guarantee that the match is in the ID column rather than in the INFO column.

    echo "rs114420996" > rs.txt 

    echo "rs187434873" >> rs.txt

    echo "rs183189405" >> rs.txt


    sort rs.txt > sorted.rs.txt

    curl -s 'https://raw.github.com/arq5x/gemini/master/test/ALL.wgs.phase1_release_v3.20101123.snps_indels_sv.sites.snippet.vcf' |
    grep -E '^#' 


    curl -s 'https://raw.github.com/arq5x/gemini/master/test/ALL.wgs.phase1_release_v3.20101123.snps_indels_sv.sites.snippet.vcf' |
    grep -v -E '^#' |
    sort -t '    ' -k3,3 |
    join -t '    ' -1 3 -2 1 - sorted.rs.txt |
    awk -F '    ' '{OFS="    "; tmp=$1;$1=$2;$2=$3;$3=tmp; print $0;}'

ADD COMMENTlink modified 6.0 years ago • written 6.0 years ago by Pierre Lindenbaum124k
1
gravatar for Alex Reynolds
6.0 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

What i would like to do is get the chromosome and coordinate for each of my SNPs from the dbSNP VCF.

$ vcf2bed < mySnps.vcf | grep -w <rsNumber> | cut -f1-3 > <rsNumber>.bed

Replace <rsNumber> with your SNP ID of interest. The result is a BED file, with the first column containing the chromosome and the second and third columns the reference base position of the SNP.

If you're going to do this for a lot of SNPs, save the converted results and grep on that:

$ vcf2bed < mySnps.vcf > mySnps.bed

Then:

$ grep -w <rsNumber1> mySnps.bed | cut -f1-3 > <rsNumber1>.bed
$ grep -w <rsNumber2> mySnps.bed | cut -f1-3 > <rsNumber2>.bed
...
$ grep -w <rsNumberN> mySnps.bed | cut -f1-3 > <rsNumberN>.bed

Or put into a shell script, iterating over a list of rs* IDs.

More information on vcf2bed is located here.

ADD COMMENTlink modified 6.0 years ago • written 6.0 years ago by Alex Reynolds29k
0
gravatar for Ashutosh Pandey
6.0 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

Well this should be a pretty simple to do. Have you even tried it? You can try:

1) grep -Ff rs#_file vcf_file

2) Search online posts like http://stackoverflow.com/questions/18345067/how-to-use-strings-from-one-text-file-to-search-another-and-create-a-new-text-f

ADD COMMENTlink modified 6.0 years ago • written 6.0 years ago by Ashutosh Pandey11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 851 users visited in the last hour