Dear scientific community,
I have to call variants from WGS samples of citrus. I used GATK pipeline for post processing of aligned reads but reference dbSNP file is not available for citrus sinensis. I am using bootstraping method. Removed duplicates and called variants using FreeBayes to build reference vcf file to be used for BQSR step later. I got merged vcf file of 84.4GB size using FreeBayes. Is it normal to have such a large file? What I mean is normally we get vcf file from dbSNP of around 1GB size. How can I reduce its size or is it okay to go with it?
Your suggestions will be highly appreciated.