Query A Dbsnp Vcf File
4
1
Entering edit mode
10.4 years ago
win ▴ 970

I downloaded the dbSNP VCF file and I have a seperate list of rs#. What i would like to do is get the chromosome and coordinate for each of my SNPs from the dbSNP VCF.

Ideally it would be great if i could create another VCF file with only my SNPs in it and another files which had my SNP and the chromosome number and coordinate with it.

Any help much appreciated.

dbsnp vcf • 5.9k views
ADD COMMENT
2
Entering edit mode
10.0 years ago
pd3 ▴ 350

Try `bcftools view -i'%ID=@path/to/ids.txt' file.vcf`

http://samtools.github.io/bcftools/bcftools.html

ADD COMMENT
1
Entering edit mode
10.4 years ago
  • sort your rs list
  • extract the VCF header
  • sort the VCF lines on rs
  • join the files
  • restore the original column order after join

if your list of SNP is not too large, you can insert a fgrep -f rs.txt before the sort to speed up things. But at this point, there is no guarantee that the match is in the ID column rather than in the INFO column.

    echo "rs114420996" > rs.txt 

    echo "rs187434873" >> rs.txt

    echo "rs183189405" >> rs.txt


    sort rs.txt > sorted.rs.txt

    curl -s 'https://raw.github.com/arq5x/gemini/master/test/ALL.wgs.phase1_release_v3.20101123.snps_indels_sv.sites.snippet.vcf' |
    grep -E '^#' 


    curl -s 'https://raw.github.com/arq5x/gemini/master/test/ALL.wgs.phase1_release_v3.20101123.snps_indels_sv.sites.snippet.vcf' |
    grep -v -E '^#' |
    sort -t '    ' -k3,3 |
    join -t '    ' -1 3 -2 1 - sorted.rs.txt |
    awk -F '    ' '{OFS="    "; tmp=$1;$1=$2;$2=$3;$3=tmp; print $0;}'

ADD COMMENT
1
Entering edit mode
10.4 years ago

What i would like to do is get the chromosome and coordinate for each of my SNPs from the dbSNP VCF.

$ vcf2bed < mySnps.vcf | grep -w <rsNumber> | cut -f1-3 > <rsNumber>.bed

Replace <rsNumber> with your SNP ID of interest. The result is a BED file, with the first column containing the chromosome and the second and third columns the reference base position of the SNP.

If you're going to do this for a lot of SNPs, save the converted results and grep on that:

$ vcf2bed < mySnps.vcf > mySnps.bed

Then:

$ grep -w <rsNumber1> mySnps.bed | cut -f1-3 > <rsNumber1>.bed
$ grep -w <rsNumber2> mySnps.bed | cut -f1-3 > <rsNumber2>.bed
...
$ grep -w <rsNumberN> mySnps.bed | cut -f1-3 > <rsNumberN>.bed

Or put into a shell script, iterating over a list of rs* IDs.

More information on vcf2bed is located here.

ADD COMMENT
0
Entering edit mode
10.4 years ago

Well this should be a pretty simple to do. Have you even tried it? You can try:

1) grep -Ff rs#_file vcf_file

2) Search online posts like http://stackoverflow.com/questions/18345067/how-to-use-strings-from-one-text-file-to-search-another-and-create-a-new-text-f

ADD COMMENT

Login before adding your answer.

Traffic: 2971 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6