how to query rsid's using dbsnp in r or linux
0
0
Entering edit mode
2.2 years ago
Nance • 0

Hi,

I have a dataset of summary statistics however lots [millions...] of the snp names/rsid are missing and I want to add these. I want to add them using the chr / pos / a1 / a0 info that I have.

I know I need to use an existing database to do this - Much like how to query ensembl sql database - to check if a snp (name = rs...) is in an intron ? - however I do not have the snp name so this seems to hinder me? Most of the posts are designed for filling in snp info when you already have the name?

I downloaded the latest build from dbsnp https://ftp.ncbi.nih.gov/snp/latest_release/ however this is a whole database and I am finding it is too big to work with I think.

I also looked at the r package rsnps however you need the SNP names first - this has already been raised as a bug https://github.com/ropensci/rsnps/issues/122

Is there an easier way to do this? Can you use biomaRt to do it? I have seen this post Dbsnp : Best Way To Obtain Data On Snps and started running it however I do not have the SNP names

Thanks!

snp impute dbsnp rsid biomaRt • 1.9k views
ADD COMMENT
2
Entering edit mode

if you want to annotate your data with dbSNP, I would suggest to use tools such as bedtools, bcftools to annotate your data with dbSNP vcf.

ADD REPLY
0
Entering edit mode

Hi thankyou. However my other file [not dbsnp] is not a vcf so I can not use these tools [yet]. Do you know of a way I can convert this on the command line or otherwise? I keep seeing the use of:

mv file.txt file.vcf

or

cp file.txt file.vcf

As a conversion but this does not work for me! Thanks!

ADD REPLY
1
Entering edit mode

if you could post entries from text file, that would help understanding the issue.

ADD REPLY
0
Entering edit mode

The .txt file is very standard format:

Chr     Pos     rsID    A0      A1      Beta-A1 P            
chr1    11888   NA      T       C       -0.109  0.81     
chr1    11213   NA      T       C       0.215   0.46    
chr1    11234   NA      T       C       -0.135  0.8      
chr1    12567   NA      C       T       0.177   0.77   
chr1    13333   NA      G       A       -0.165  0.81

And the .vcf is from dbsnp latest release https://ftp.ncbi.nih.gov/snp/latest_release/VCF/

##fileformat=VCFv4.2
##fileDate=20210513
##source=dbSNP
##dbSNP_BUILD_ID=155
##reference=GRCh38.p13
##phasing=partial

And then more vcf lines until

#CHROM             POS     ID                       REF     ALT     
NC_000001.11    10001   rs1570391677    T           A,C       
NC_000001.11    10002   rs1570391692    A           C          
NC_000001.11    10003   rs1570391694    A           C     

So I would like to use columns chr / pos / ref allele / alt allele to 'fill in' or 'impute' the RSIDS in to the .txt file above. I have been told dbsnp is the best resource for this but really struggling to know how to combine these data.

Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 3428 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6