Download of dbSNP VCF file
3
2
Entering edit mode
4.4 years ago

HI everyone ; can someone tell me where to find known indels.vcf and dbsnp.vcf for the GRCh38 reference genome Build thank's

snp • 12k views
ADD COMMENT
3
Entering edit mode
ADD COMMENT
2
Entering edit mode
4.4 years ago
ATpoint 66k

dbSNP is the name of the entire database. The VCF files they provide include both SNPs and InDels. For quick retrieval of variantions in certain genomic regions, also download the .tbi (tabix index) and make yourself familiar with the usage of Tabix. I edited the title of your question to make it more clear. Please try to choose more appropriate titles in the future. Cheers!

ADD COMMENT
2
Entering edit mode

So, from the ftp link you provide, which vcf file should be used when using BaseRecalibrator from GATK in order to skip over known polymorphic sites? Looks like 00-All.vcf.gz would be the most thorough, but it is 15 GB. Thanks!

ADD REPLY
0
Entering edit mode

To add to yours and Agata's answers (+1), indels can be extracted with bcftools view -v indels mysnps.vcf.gz, see bcftools. (I would resist the temptation of parsing vcf as text using per/python/awk scripts.)

ADD REPLY

Login before adding your answer.

Traffic: 1453 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6