I obtained trio-based exome data (vcf files) from vendor. VCF files show PGT and PID for many rows. GATK explained them as Physical phasing haplotype information and Physical phasing ID information, respectively. I need to detect de novo, autosomal recessive, and compound heterozygote variants. For the last two, I would like to phase the data. I want to phase the data using the tool 'Beagle'. For that, I would like to consider each trio (unaffected father, unaffected Mother, and affected child), remove PGT and PID, and then run Beagle for phasing the data. Thereafter, I would like to use the tool Gemini to call autosomal recessive and compound heterozygotes. Am I proceeding correctly? Can someone guide me if this is a right approach? Since I spent much time on Gemini, I would like to use this tool. Also, please suggest if I can skip the phasing procedure if I keep the PGT and PID values in VCF file?
I am new in the field and have very limited knowledge. It would be a great help if some additional issues are discussed that I can not think right now.
Thanks a lot.