Question: How to annotate SNP variant records in a vcf file with IDs from a bed file?
0
gravatar for William
3.5 years ago by
William4.4k
Europe
William4.4k wrote:

How can I annotate a vcf file with IDs from a bed file?

I have a bgzip compressed bed file with the required three columns (chrom, chromStart and chromEnd) and a fourth column that contains the IDs. The bed file does not have a header record.

https://genome.ucsc.edu/FAQ/FAQformat.html

I indexed the file with

tabix -p bed myBedFile.gz

Now I would like to annotate SNP variant records in a vcf file with the IDs from the bed file, based on exact matching of the vcf CHROM / POS fields to the chrom, chromStart and chromEnd columns of the bed file.

I believe this should be possible with bcftools annotate

https://samtools.github.io/bcftools/bcftools.html#annotate

but my

bcftools annotate -c CHROM,FROM,TO,ID -a myBedFile.bed.gz myVcfFile.vcf.gz

returns a vcf file with unchanged SNP variant records, no ID is set in the ID column or the INFO column.

How can I annotate the vcf ID column with IDs from matching positions a bed file? Preferable with a tool like bcftools?

bcftools bed vcf • 2.2k views
ADD COMMENTlink modified 3.5 years ago by cpad011211k • written 3.5 years ago by William4.4k
1
gravatar for cpad0112
3.5 years ago by
cpad011211k
India
cpad011211k wrote:

I tried running this code (copy/pasted from above except for input files):

bcftools annotate -c CHROM,FROM,TO,ID -a hg19.chr20.refseq.bed.gz na12878.cutadapt.q20.bwa.rmdup.sorted.vcf.gz

Output I got, for a variant with match was:

chr20    62397026    NM_025224    T    C    6.22876e-06    .    AB=0;ABP=0;AC=0;AF=0;AN=2;AO=2;CIGAR=1X;DP=7;DPB=7;DPRA=0;EPP=7.35324;EPPR=13.8677;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=60;NS=1;NUMALT=1;ODDS=13.4549;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=9;QR=202;RO=5;RPL=0;RPP=7.35324;RPPR=6.91895;RPR=2;RUN=1;SAF=0;SAP=7.35324;SAR=2;SRF=4;SRP=6.91895;SRR=1;TYPE=snp    GT:DP:RO:QR:AO:QA:GL    0/0:7:5:202:2:9:0,-1.25221,-17.7041

Output I got, for a variant without match was:

chr20    62962891    .    C    T    64.6144    .    AB=0.666667;ABP=4.45795;AC=1;AF=0.5;AN=2;AO=4;CIGAR=1X;DP=6;DPB=6;DPRA=0;EPP=11.6962;EPPR=3.0103;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=60;NS=1;NUMALT=1;ODDS=4.49327;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=157;QR=80;RO=2;RPL=0;RPP=11.6962;RPPR=3.0103;RPR=4;RUN=1;SAF=0;SAP=11.6962;SAR=4;SRF=2;SRP=7.35324;SRR=0;TYPE=snp    GT:DP:RO:QR:AO:QA:GL    0/1:6:2:80:4:157:-12.6991,0,-5.78561

I do not see any problem with your code. Check if

  • chromosome numbers are same (chrom in VCF and 1st column in bed file).
  • VCF and Bed files are indexed with tabix
  • bed file corresponds with same genome build and genomic coordinates that span VCF
  • indeed there are no matches manually (can happen some times)

 

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by cpad011211k

Ok good to known that this should work / works for someone else. I might very well have made a mistake my self. I'll double check what I did and the points you mentioned. Thanks.

ADD REPLYlink written 3.5 years ago by William4.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 883 users visited in the last hour