Break_blocks conversion of gvcf -> vcf
21 months ago
dec986 ▴ 300

I have a large gVCF that I'm trying to get into a VCF using a bed file that looks like this

1   58813   58814   rs114420996 .   G   A   PASS    .   GT:GQ   ./.:0.0
1   565507  565508  rs9283150   .   G   A   PASS    .   GT:GQ   ./.:0.0
1   567091  567092  rs9326622   .   T   C   PASS    .   GT:GQ   ./.:0.0
1   726911  726912  1:726912    .   A   G   PASS    .   GT:GQ   0/0:0.27129138

and getting the necessary positions thus:

break_blocks --region-file $bed --ref human_g1k_v37.fasta --exclude-off-target

which produces a gVCF with the correct regions.

However, this has to be a VCF, not a gVCF.

Thus, I convert using advice from Converting Gvcf Files Into Vcf extract variants, but this produces a file with about 75% of the data missing, which isn't acceptable. I get similar results when using

gatk SelectVariants -R $fasta -V $vcf -O $outfile --exclude-non-variants

how can I get all of the 661,000 or so positions extracted from this gVCF?

Unless I've not missed an important point you can use bcftools to extract variant sites from a gvcfs

$ bcftools view -m2 input.vcf

The -m parameter filters for sites with a minimum number of alleles listed in REF and ALT.


