I have created the vcf file from fastq file using recent GATK pipeline..
After I finished the varaint discovery procedure(inclduing thevariant recalibration), I can get the vcf file which are ready to annotate using other tools such as snpEff.. etc..
but the queston is this.
Our miSeq machine provided by Illumina provided built-in program to make vcf file from fastq file automatically..
( In this case, I don't need to run GATK by myself. the machine build-in program will do everything.. I checked that they also use GATK pipeline.)
However, my vcf file ( I created by myself with GATK pipeline) and the automatically generated vcf file by illumina machine is very different at the perspective of number of variants.
I know that the different program report different variant calls. However, the automatically generated vcf file generated by illumina machine has about 9300 variants called. However, my vcf file ( i generated using GATK) has 55000 variants, which are huge.
I know I need to filter out some variants based on several criteria such as read depth, quality score etc. But, I think at the very beginning, the number of callled variants should be comparable.. Do I miss something?
Could you please someone help me with this?