How to annotate SNP variant records in a vcf file with IDs from a bed file?
1
1
Entering edit mode
8.5 years ago
William ★ 5.3k

How can I annotate a vcf file with IDs from a bed file?

I have a bgzip compressed bed file with the required three columns (chrom, chromStart and chromEnd) and a fourth column that contains the IDs. The bed file does not have a header record.

https://genome.ucsc.edu/FAQ/FAQformat.html

I indexed the file with

tabix -p bed myBedFile.gz

Now I would like to annotate SNP variant records in a vcf file with the IDs from the bed file, based on exact matching of the vcf CHROM / POS fields to the chrom, chromStart and chromEnd columns of the bed file.

I believe this should be possible with bcftools annotate

https://samtools.github.io/bcftools/bcftools.html#annotate

but my

bcftools annotate -c CHROM,FROM,TO,ID -a myBedFile.bed.gz myVcfFile.vcf.gz

returns a vcf file with unchanged SNP variant records, no ID is set in the ID column or the INFO column.

How can I annotate the vcf ID column with IDs from matching positions a bed file? Preferable with a tool like bcftools?

vcf bcftools bed • 5.1k views
ADD COMMENT
1
Entering edit mode
8.5 years ago

I tried running this code (copy/pasted from above except for input files):

bcftools annotate -c CHROM,FROM,TO,ID -a hg19.chr20.refseq.bed.gz na12878.cutadapt.q20.bwa.rmdup.sorted.vcf.gz

Output I got, for a variant with match was:

chr20    62397026    NM_025224    T    C    6.22876e-06    .    AB=0;ABP=0;AC=0;AF=0;AN=2;AO=2;CIGAR=1X;DP=7;DPB=7;DPRA=0;EPP=7.35324;EPPR=13.8677;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=60;NS=1;NUMALT=1;ODDS=13.4549;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=9;QR=202;RO=5;RPL=0;RPP=7.35324;RPPR=6.91895;RPR=2;RUN=1;SAF=0;SAP=7.35324;SAR=2;SRF=4;SRP=6.91895;SRR=1;TYPE=snp    GT:DP:RO:QR:AO:QA:GL    0/0:7:5:202:2:9:0,-1.25221,-17.7041

Output I got, for a variant without match was:

chr20    62962891    .    C    T    64.6144    .    AB=0.666667;ABP=4.45795;AC=1;AF=0.5;AN=2;AO=4;CIGAR=1X;DP=6;DPB=6;DPRA=0;EPP=11.6962;EPPR=3.0103;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=60;NS=1;NUMALT=1;ODDS=4.49327;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=157;QR=80;RO=2;RPL=0;RPP=11.6962;RPPR=3.0103;RPR=4;RUN=1;SAF=0;SAP=11.6962;SAR=4;SRF=2;SRP=7.35324;SRR=0;TYPE=snp    GT:DP:RO:QR:AO:QA:GL    0/1:6:2:80:4:157:-12.6991,0,-5.78561

I do not see any problem with your code. Check if

  • chromosome numbers are same (chrom in VCF and 1st column in bed file).
  • VCF and Bed files are indexed with tabix
  • bed file corresponds with same genome build and genomic coordinates that span VCF
  • indeed there are no matches manually (can happen some times)
ADD COMMENT
0
Entering edit mode

Ok good to known that this should work / works for someone else. I might very well have made a mistake my self. I'll double check what I did and the points you mentioned. Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 2797 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6