I am using vcftools to breakdown a large VCF file into smaller files using -
for i in `seq 1 22`; do vcftools --gzvcf ~/path_to_large.vcf.gz --chr "$i" --out ~/path_to_small_vcf --recode; done
This is the message I got after running this command (using chr22 as example)
VCFtools - 0.1.15 (C) Adam Auton and Anthony Marcketta 2009 Parameters as interpreted: --gzvcf /path_to_large_vaf/large.vcf.gz --chr 22 --out /path_to_small_vcr/ --recode Using zlib version: 1.2.8 After filtering, kept 1000 out of 1000 Individuals Outputting VCF file... After filtering, kept 72353 out of a possible 2825214 Sites Run Time = 987.00 seconds
I got the results that were split into different chromosomes but I noticed there are a huge number of variants got filtered out from the original 2825214 sites (only 72353 remained). I did not specify any filtering criteria in the command, what are the potential cause of this filtering process?
A little more about the vcf file used