Hello Bioinformagicians,
I’m a somewhat rusty terminal-based processes person with some variant calling experience in my prior workspace. I am not used to working from a PC so installed the Ubuntu terminal for command prompts.
In my current position, I am pretty much limited to samtools, but if there is a way to do this using GATK/Plink I’m all ears - just might need some assistance in downloading/installing. I’ve been tasked to annotate a 30x WGS human .bam with all dbSNP calls (including non-variants). I have generated an uncompressed .bcf using bcftools mpileup using the assembly I believe it was aligned to (GRch38.p14 (hg38)). I then used bcftools call:
bcftools call -c -Oz -o <called_file.vcf.gz> <inputfile.bcf>
I am having an issue annotating/adding the dbSNP rsid column. I have used a number of bcftools annotate functions, but they turn into dots near the end of chr1. Both files have been indexed. The command I'm using is:
bcftools annotate -a <reference .vcf.gz file> -c ID output <called_file.vcf.gz> -o <output_withrsIDs.vcf.gz>
I assume that the downloaded .vcf file (+index) doesn’t match. I am looking for a dbSNP vcf compatible with GRch38.p14 (hg38). I searched for a recent version (dbSNP155) but can only find big bed files.
Does anyone have a link / alternative name for a dbSNP dataset in VCF for download that is compatible with GRch38.p14 or can point me in the right direction to convert the big bed? My main field of research before was variant calling only, with in-house Bioinformatic support, so calling all SNPs has me a bit at sea!
Thanks so much for any help :)
dbSNP for GRCh38 is here: https://ftp.ncbi.nlm.nih.gov/snp/latest_release/VCF/GCF_000001405.40.gz. You will also need the index https://ftp.ncbi.nlm.nih.gov/snp/latest_release/VCF/GCF_000001405.40.gz.tbi