I have some BAM files where the reads are phased, such that each read belongs to haplotype 1 or 2 (or NA if the read isn't phased). Next I call variants in these BAMs, and obtain an unphased VCF. The question is: is there a method, algorithm, software, anything, that allows me to phase the variants in the VCF based on the phasing information in the BAM?
My explanation above probably wasn't clear enough. I don't want to phase the BAM, the BAM is already phased. I basically want to use the phasing in the BAM to infer the phasing of the VCF.
The BAM is phased using a linked-read technology from 10x Genomics. Basically, DNA fragments are barcoded such that two reads that have the same barcode are close together, and this gives the information to, among other things, phase the data.
@Vitis answer actually solves my problem, although it isn't exactly what I was looking for initially.
With WhatsHap, you can phase a VCF by providing a BAM file and optionally a phased VCF file. WhatsHap will phase the input VCF with read-backed phasing and use haplotypes defined in the phased VCF as well. So what I will do is actually phase my VCF based on a different already phased VCF of the same sample.
If you do have the problem I posed originally, and don't have a phased VCF, I think there is a potential work-around. If you can extract haplotypes, or phase-blocks, from your phased BAM, you could write them to a BAM as fictitious reads, and have WhatsHap phase your input VCF based on that.