I have a very large VCF file (>400gb), and I want to divide it to use with VEP. VEP recommends separating the vcf, so I generated a list of contigs, based on the header, with 3^7 bases for each chromosome. This gave me a list list like this:
All the alt/small contigs are excluded because there are no variants within them.
And I have my chromosomes separated in different vcf files from a preprocessing
chr1.vcf.gz chr2.vcf.gz etc
But separating chromosomes like:
bcftools view -r "$CHR" bigvcffile.vcf > "$CHR".vcf
Seems very inefficient as bcftools will run the filter on the big file (which takes 5-6 hours), all the times I separate a chromosome. I did this because this process would be worsened if I did a length filtering with 100 pieces, each one filtering based on the original VCF
How can I use bcftools to split these vcf based on my file in a more efficient way? Not sure if it is even possible