3
3
Entering edit mode
4.1 years ago
gaelgarcia ▴ 240

What is the best / standard place to get a full list of SNPs and their coordinates in hg38?

I downloaded the SNPsnap database, but just realized that those coordinates are in hg19.

I'm trying to figure out how many SNP sites exist in my targeted genome sequencing data.

Many thanks.

SNP hg38 SNPsnap DBsnp population genetics • 13k views
4
Entering edit mode
4.1 years ago

One can download it in many formats by first going here and then choosing the dbSNP build version and the human genome reference build:

For example, human_9606_b151_GRCh38p7 is dbSNP release version 151 with co-ordinates for GRCh38.p7.

The VCF format is a common download. In the VCF directory, the 00-All.vcf.gz file is the one that contains all records. Take a look at the READMEs in order to see what's in all of the other files.

Kevin

0
Entering edit mode

Thank you for your clear response, Kevin. I see that I can download the list in BED format, but there doesn't appear to be a file with all chromosomes; instead, there is one file per chromosome. Is there a reason why one can't download the full list in BED format?

3
Entering edit mode

@OP: Download all 00-All.vcf.gz (with all the variants), then convert vcf to bed using vcf2bed.

0
Entering edit mode

0
Entering edit mode

you too @kevin

1
Entering edit mode

Hello gaelgarcia,

having one file per chromomse have the advantage that you only need to download one smaller file of you investigate only a specific region.

If you need all informations in one file, you can concatenate the files to one after downloading.

fin swimmer

1
Entering edit mode

Yes, as per fin swimmer. Large datasets are typically made available on a per chromosome basis. The VCF version of dbSNP should contain all variants across all chromosomes, though (but it's a very large file > 10GB).

0
Entering edit mode

Great - thanks again.

3
Entering edit mode
4 months ago

The accepted answer has a link to the URL which was not updated since build 151. The current build (155 as of the day of writing this answer) is available in https://ftp.ncbi.nlm.nih.gov/snp/latest_release/ (in VCF and compressed JSON formats). For VCF you want to use GCF_000001405.25.gz for GRCh37 and GCF_000001405.25.gz for GRCh38. Builds after 151 are archived in https://ftp.ncbi.nlm.nih.gov/snp/archive/

0
Entering edit mode

Thank you for posting

2
Entering edit mode
2.9 years ago
Shicheng Guo ★ 9.0k

dbSNP153.hg19.vcf

wget https://ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.25.bgz -O ~/hpc/db/hg19/dbSNP153.hg19.vcf.bgz
tabix -p vcf dbSNP153.hg19.vcf.bgz
zcat dbSNP153.hg19.vcf.bgz > dbSNP153.hg19.vcf


dbSNP153.hg38.vcf

wget https://ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.38.bgz -O ~/hpc/db/hg38/dbSNP153.hg38.vcf.bgz
tabix -p vcf dbSNP153.hg38.vcf.bgz
zcat dbSNP153.hg19.vcf.bgz > dbSNP153.hg38.vcf.bgz

1
Entering edit mode

Hi! May I know if this build 152 is the latest dbsnp release? Also, I noticed in the build 152 file you shared, the first column contains Refseq accession numbers (NC/NT), instead of chromosome no as in build 151. Do you know how can I convert the file?

2
Entering edit mode

I don't think dbSNP152 is the latest version. I think dbSNP153 is the lasted version. However, the above command will give you latest dbSNPs no matter it is 152 or 153. Maybe I should change 152 to 153 in my above answer.

check these two files and they will help you transfer NC/NT** to chr1, chr2

https://raw.githubusercontent.com/Shicheng-Guo/AnnotationDatabase/master/GCF_000001405.25_GRCh37.p13_assembly_report.txt

https://raw.githubusercontent.com/Shicheng-Guo/AnnotationDatabase/master/GCF_000001405.38_GRCh38.p12_assembly_report.txt

0
Entering edit mode

Hi Shicheng Guo

Do you know if such simply change on chrom names to equivalent ones (using this table you shared above) could be done, without major problems?