How to add rsIDs to VCF?
1
3
Entering edit mode
22 months ago
markgodek ▴ 40

I'm still a beginner and using Illumina Platinum Genomes as a toy dataset. Google supplies the files in VCF, but I need them annotated with rsIDs.

I've tried GATK VariantAnnotator and bcftools annotate as suggested by this post I found via Google and this Biostars post, but I'm not having any luck.

When the command line finishes running, the output doesn't have the rsIDs filled in and no obvious errors pop up. I'm kinda stumped.

Any help is appreciated.

SNP • 3.3k views
ADD COMMENT
0
Entering edit mode

I've tried GATK VariantAnnotator and bcftools annotate as suggested by this post I found via Google, but I'm not having any luck.

show us your command lines.

ADD REPLY
1
Entering edit mode
gatk VariantAnnotator -R '/home/mark/Desktop/Reference_Genome_hg19/hg19.fa' -I '/home/mark/Desktop/Google_CEPH_Genomes_hg19/platinum-genomes_bam_NA12877_S1.bam' -V '/home/mark/Desktop/CEPH VCFs/platinum-genomes_vcf_NA12877_S1.genome.vcf.gz' -O '/home/mark/Desktop/CEPH VCFs/GATK_NA12877_S1_rsids.genome.vcf' -L '/home/mark/Desktop/CEPH VCFs/platinum-genomes_vcf_NA12877_S1.genome.vcf.gz' --dbsnp '/home/mark/Desktop/Google_CEPH_Genomes_hg19/common_all_20180418.vcf.gz' 


bcftools annotate -a '/home/mark/dbsnp_138.b37.vcf.gz' -c ID '/home/mark/Desktop/CEPH VCFs/platinum-genomes_vcf_NA12877_S1.genome.vcf.gz' -o '/home/mark/Desktop/CEPH VCFs/NA12877_with_rsIDs.vcf.gz'
ADD REPLY
9
Entering edit mode
20 months ago
erkinacar5 ▴ 90

Hey it's quite some time ago but if anyone else is having a problem I just wanted to say following command worked for me:

bcftools annotate -a /data/references/hg19/pipe/dbsnp138/00-All.vcf.gz -c ID -o samtools_annotated.vcf.gz samtools.vcf.gz

The thing to look out for is, I think it doesn't work if your reference has numeric chromosome column while your VCF file has it as with 'chr' added to is, e.g. "chr1".

ADD COMMENT
0
Entering edit mode

Thanks for the follow up. I'm working on a different project now but I was able to add the rsIDs with bcftools.

I think the issue before was I was expecting more locations to have an rsID, but the NA12877 genome is so large there are just so many positions that aren't variants so it was hard to tell if it worked.

ADD REPLY

Login before adding your answer.

Traffic: 2446 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6