I mapped my reads to an annotated reference, exported to bam, than used freebayes to call variants. I'm now trying to use vcf-annotated to see how many variants fall within coding regions and what their effect is on the protein.
I ran vcf-annotate as follows:
bgzip Sample1_Genome_noHET_noCo.sort.bed tabix -s 1 -b 2 -e 3 Sample1_Genome_noHET_noCo.sort.bed.gz cat GDR-18.sort.grp.bam.vcf | vcf-annotate -a Sample1_Genome_noHET_noCo.sort.bed.gz -d key=INFO,ID=ANN,Number=1,Type=Integer,Description='My custom annotation' -c CHROM,FROM,TO,INFO/ANN > out.vcf
my input vcf looks like this:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT whatever Sample1_Genome_noHET 989297 . ATAAATTATAAATAAC GTAAATTATAAATAAT,TTAATCTATAAATAAT,ATAAACTATAAATAAT 1906.94 . AB=0.253165,0.316456,0.21519;ABP=44.818,26.1269,58.6715;AC=1,1,1;AF=0.25,0.25,0.25;AN=4;AO=20,25,17;CIGAR=1X14M1X,1X3M2X9M1X,5M1X9M1X;DP=79;DPB=97.9375;DPRA=0,0,0;EPP=6.91895,3.09716,9.26925;EPPR=5.18177;GTI=0;HWE=-0;LEN=16,16,16;MEANALT=4,4,4;MQM=255,255,255;MQMR=255;NS=1;NUMALT=3;ODDS=120.598;PAIRED=1,1,1;PAIREDR=1;PAO=16.5833,3.58333,5.08333;PQA=632.083,138.083,195.083;PQR=297.75;PRO=7.75;QA=768,947,658;QR=631;RO=16;RPP=6.91895,57.2971,9.26925;RPPR=22.5536;RUN=1,1,1;SAP=4.74748,3.09716,6.20364;SRP=11.6962;TYPE=complex,complex,complex;XAI=0.00326384,0.0015,0.00163399;XAM=0.0370321,0.0015,0.0319516;XAS=0.0337682,0,0.0303176;XRI=0;XRM=0.000726744;XRS=0.000726744;technology.illumina=1,1,1;BVAR GT:DP:RO:QR:AO:QA 0/1/2/3:79:16:631:20,25,17:768,947,658
and my .bed looks like this:
#CHR FROM TO ANNOTATION Sample1_Genome_noHET 101 563 hypothetical protein NCER_102224 CDS 0 + Sample1_Genome_noHET 1293 1728 hypothetical protein NCER_102355 CDS 0 -
The problem is that the out.vcf file is only 300 bites bigger than the original GDR-18.sort.grp.bam.vcf. And I cannot find any difference between the two. What am I doing wrong and what do I expect to see if it is done right?