Entering edit mode
4.2 years ago
dec986
▴
370
I have a list of VCFs that I'm converting from gVCF to VCF thus:
zcat HG00096.final_combined.g.vcf.gz | gvcftools-0.17.0/bin/break_blocks --region-file gsa_positions.chromEnd_plus1.bed --ref g1k_v37/human_g1k_v37.fasta --exclude-off-target > vcf/HG00096.bb.vcf 2> break_blocks_err/HG00096.bb.err
the bed file looks like
1 58813 58814 GSA-rs114420996 . G A PASS . GT:GQ ./.:0.0
1 565507 565508 GSA-rs9283150 . G A PASS . GT:GQ ./.:0.0
1 567091 567092 GSA-rs9326622 . T C PASS . GT:GQ ./.:0.0
1 726911 726912 GSA-1:726912 . A G PASS . GT:GQ 0/0:0.27129138
1 727840 727841 GSA-rs116587930 . G A PASS . GT:GQ 0/0:0.6141555
but the problem is that the output files are missing alternate alleles, e.g.
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG00126
1 58814 . G <NON_REF> . . . GT:DP:GQ:MIN_DP:PL 0/0:31:84
:29:0,84,1192
1 565508 . G <NON_REF> . . . GT:DP:GQ:MIN_DP:PL 0/0:27:50
:27:0,50,857
1 567092 . T <NON_REF> . . . GT:DP:GQ:MIN_DP:PL 0/0:16:45
:16:0,45,675
I've also tried piping to extract_variants
from Converting Gvcf Files Into Vcf but this isn't working either.
how can I get the output to not have the NON_REF
so that it looks like a correct VCF?
they don't have identical sort order :(
then sort both files...
the files are of different lengths. I just ended up using a Perl hash, which was much easier.