Question

combine VCF from diploid reference/haplotypes for the same sample

0

Entering edit mode

1 day ago

Matteo Ungaro ▴ 130

Hi there,

I have diploid calls from long reads HiFi for a sample on both his haplotypes/assemblies (I'm working with humans, so I have only hap1 and hap2). Is there a way to correctly merge these two files based on a single reference?

The idea is to test and benchmark the effect of mapping to the genome of origin for the sample against any linear reference, which in theory should have better performance. I'm open to other suggestions if this won't be possible; for instance, mapping to the "most complete" between the two haplotypes then combine the two VCF files to prevent issues with mismatches in reference calls.

This would be still better than mapping to a different reference but won't capture variants inherent the other haplotype. Let me know what you think, thanks in advance!

bcftools combine VCF • 133 views

ADD COMMENT • link updated 3 hours ago by cmdcolin ★ 4.3k • written 1 day ago by Matteo Ungaro ▴ 130

0

Entering edit mode

this question is a little similar to your previous question (best practice for diploid variant calling). just as added info, here is a tool called "phased assembly variant caller" https://github.com/EichlerLab/pav

it is one tool of "assembly-based variant calling", there may be other tools that are more focused on smaller or larger variants

the idea of aligning the reads used to create an assembly against that same assembly is a little bit of a tricky thing. it can be useful, but with perfect mapping and perfect assembly, there would be "no variants". so any work in that direction basically uncovers misassemblies or misalignments, or, bias from aligning reads from the wrong "haplotype". the reads could be separated by haplotype using techniques like whatshap or maybe some other technique

ADD REPLY • link 3 hours ago by cmdcolin ★ 4.3k