Break_blocks conversion of gvcf -> vcf
0
0
Entering edit mode
21 months ago
dec986 ▴ 300

I have a large gVCF that I'm trying to get into a VCF using a bed file that looks like this

1   58813   58814   rs114420996 .   G   A   PASS    .   GT:GQ   ./.:0.0
1   565507  565508  rs9283150   .   G   A   PASS    .   GT:GQ   ./.:0.0
1   567091  567092  rs9326622   .   T   C   PASS    .   GT:GQ   ./.:0.0
1   726911  726912  1:726912    .   A   G   PASS    .   GT:GQ   0/0:0.27129138

and getting the necessary positions thus:

break_blocks --region-file $bed --ref human_g1k_v37.fasta --exclude-off-target

which produces a gVCF with the correct regions.

However, this has to be a VCF, not a gVCF.

Thus, I convert using advice from Converting Gvcf Files Into Vcf extract variants, but this produces a file with about 75% of the data missing, which isn't acceptable. I get similar results when using

gatk SelectVariants -R $fasta -V $vcf -O $outfile --exclude-non-variants

how can I get all of the 661,000 or so positions extracted from this gVCF?

genome vcf • 639 views
ADD COMMENT
0
Entering edit mode

Unless I've not missed an important point you can use bcftools to extract variant sites from a gvcfs

$ bcftools view -m2 input.vcf

The -m parameter filters for sites with a minimum number of alleles listed in REF and ALT.

ADD REPLY

Login before adding your answer.

Traffic: 1834 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6