how to get iDs in VCF file?
0
0
Entering edit mode
2.9 years ago
storm1907 ▴ 30

Hello, I have to work with vcf files, were no IDs are present, such as:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  ivf-15.F
chr1    69270   .       A       G       55.4    PASS    CSQ=|FAIL|0.01|0.00|0.00|0.00|-10|26|-28|-25|||LOW|OR4F5|ENSG00000186092|ENST00000335137|protein_coding|1/1||216|60|S|tcA/tcG|,|FAIL|0.01|0.00|0.00|0.00|-10|26|-28|-25|||LOW|OR4F5|ENSG00000186092|ENST00000641515|protein_coding|3/3||303|81|S|tcA/tcG|    GT:GQ:DP:AD:VAF:PL      1/1:55:18:0,18:1:55,65,0
chr1    69511   .       A       G       61.4    PASS    CSQ=|FAIL|0.00|0.00|0.02|0.00|26|32|26|34|||MODERATE|OR4F5|ENSG00000186092|ENST00000335137|protein_coding|1/1||457|141|T/A|Aca/Gca|,|FAIL|0.00|0.00|0.02|0.00|26|32|26|34|||MODERATE|OR4F5|ENSG00000186092|ENST00000641515|protein_coding|3/3||544|162|T/A|Aca/Gca|  GT:GQ:DP:AD:VAF:PL      1/1:59:69:0,69:1:61,62,0 

Is there a way to correct such files without making variant calling step again?

Thank you!

VCF Deepvariant • 2.7k views
ADD COMMENT
0
Entering edit mode

If you're looking to get dbSNP rsIDs in the ID field, you could use any of a bunch of variant annotation tools, including bcftools annotate + a dbSNP VCF file to do this.

ADD REPLY
0
Entering edit mode

OK, I found this https://www.ncbi.nlm.nih.gov/variation/docs/human_variation_vcf/ , but got confused, which is the right vcf file?

ADD REPLY
0
Entering edit mode

There is no "right" file. Which one do you need? If you don't care about CLINVAR, go for the file without clinical assertions. If you care about CLINVAR annotations, go for the other VCF.

I recommend you get 00-All.vcf.gz, as it seems to be the most comprehensive file at first glance.

ADD REPLY
0
Entering edit mode

ok, I ran

#!/bin/bash

module load bio/bcftools/1.10.2

dir=/mnt/home/dir
for file in $dir/file ;
do
        bcftools annotate -a $dir/00-All.vcf.gz -c ID  $dir/file -o $dir/outfile
done

and still ID field is empty

vcf file looks like that:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  ivf-32.F
chr1    69063   .       T       C       40.7    PASS    CSQ=||||||||||||MODIFIER|OR4F5|ENSG00000186092|ENST00000335137|protein_coding|1/1||9||||,||||||||||||LOW|OR4F5|ENSG00000186092|ENST00000641515|protein_coding|3/3||96|12|N|aaT/aaC| GT:GQ:DP:AD:VAF:PL       1/1:40:2:0,2:1:40,49,0
chr1    183937  .       G       A       44.7    PASS    CSQ=||||||||||||MODIFIER|FO538757.1|ENSG00000279928|ENST00000624431|unprocessed_pseudogene||4/4|||||    GT:GQ:DP:AD:VAF:PL      0/1:44:70:57,13:0.185714:44,0,58
chr1    601436  .       C       T       49.4    PASS    CSQ=||||||||||||MODIFIER|AL669831.3|ENSG00000230021|ENST00000634337|processed_transcript|4/5||404||||,||||||||||||MODIFIER|AL669831.3|ENSG00000230021|ENST00000634833|processed_transcript|3/6||317||||      GT:GQ:DP:AD:VAF:PL      0/1:48:26:12,14:0.538462:49,0,53
chr1    601667  .       C       T       28.8    PASS    CSQ=||||||||||||MODIFIER|AL669831.3|ENSG00000230021|ENST00000634337|processed_transcript||3/4|||||,||||||||||||MODIFIER|AL669831.3|ENSG00000230021|ENST00000634833|processed_transcript||2/5|||||    GT:GQ:DP:AD:VAF:PL      0/1:29:29:18,11:0.37931:28,0,46 

what could be the problem?

ADD REPLY
1
Entering edit mode

Are the reference genomes version same for your VCF file and 00-All.vcf.gz? It is possible that due to reference genome version differences, coordinates are not matching.

You can also try Snpsift annotate: http://pcingola.github.io/SnpEff/ss_annotate/

It has a good example which can be helpful if you can't make bcftools annotate to work for you.

ADD REPLY
0
Entering edit mode

Thank you, snpEff seems to be working!

ADD REPLY
0
Entering edit mode

Make sure that the tool that you end up using (bcftools/snpEff) compares loci in the way you want it to. Some tools default to comparing only CHROM and POS while some compare all of CHROM POS REF and ALT. You can customize what bcftools compares, so make sure your tool and command reflect your requirement.

ADD REPLY
0
Entering edit mode

In addition to what prasundutta says, ensure the contig names match - chr1, chr2, ... are not compatible with 1,2,....

A good way to test would be to manually find a site that is in both VCFs and run annotate just for that using

bcftools view -Ou in.vcf chr2:1234567 | bcftools annotate -a 00-All.vcf.gz -c ID -Ov - | less -S 

Replace the chr2:1234567 with the known site and ensure the output you see has the right rsID from the dbSNP file.

ADD REPLY

Login before adding your answer.

Traffic: 1850 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6