I had two vcf files and I used isec from bcftools software to find typical and common mutations between samples. The output of isec function were four vcf.gz file showing like below:
isec_output/0000.vcf.gz would be variants unique to 1.vcf.gz isec_output/0001.vcf.gz would be variants unique to 2.vcf.gz isec_output/0002.vcf.gz would be variants shared by 1.vcf.gz and 2.vcf.gz as represented in 1.vcf.gz isec_output/0003.vcf.gz would be variants shared by 1.vcf.gz and 2.vcf.gz as represented in 2.vcf.g
The output files look like something like below:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE chrM 5 . A . . PASS . GT:GQX:DP:DPF 0/0:358:120:0 chrM 7 . A . . PASS END=9;BLOCKAVG_min30p3a GT:GQX:DP:DPF 0/0:560:187:3 chrM 10 . T . . PASS END=13;BLOCKAVG_min30p3a GT:GQX:DP:DPF 0/0:782:261:5 chrM 14 . T . . PASS END=17;BLOCKAVG_min30p3a GT:GQX:DP:DPF 0/0:1092:364:5
How can I add gene names to these files? I am new to this field and I don't know how can I identify mutations by gene names from these files. Shall I do further annotation steps?
You can intersect with dbSNP vcf file. dbSNP vcf has
GENEINFOtag for gene symbol and ID. But that would be resource kill. Try with bed file as Pierre suggested below. Download gtf file for humans, filter for genes, convert the new gtf file to bed file.