Question

Which software can be used to extend the partially phased vcf from several different individuals (samples) to get an extend length haplotype?

0

Entering edit mode

8.9 years ago

kirannbishwa01 ★ 1.6k

I have vcf files which has partially phased genotypes as different blocks. These variants and phased genotypes are from several different individuals of the same populations, but its not a father-mother-child trio. The variant data is from a plant genome/transcriptome alignment and there is no phased haplotypes data available yet on this model organisms.

I actually want to input these partially phased haplotypes (blocks) from several different individuals and then extend the length of these partially phased haplotypes as much as possible using match-mismatches of shared haplotype regions and heterozygous alleles that might overlap between several samples.

I have come across, several softwares like phASER, HapCut2, WhatsHap, impute and have done some reading on all these, but I am little indecisive which I should start trying. Acutally, I have used phaSER to get the haplotype for each individuals but the extension of the haplotype length doesn't work properly while supplying haplotypes from several different individuals. So, I would like to get some suggestions on which might be a best tool from the list (or any other) that fits my purpose .

Resource I have: I actually have data from several different individuals (6 inds/population from two different population). Additionally, I have a transcriptome data from the F1 hybrids of these populations (but the F1 are not related as parent-child). I have read back phased all the individuals in the population and F1 hybrids. I think the phased data from F1 hybrids will be quite helpful to attach haplotypes at several different genic regions to get a more accurate phase of the individuals from the population.

The experimental design I have in mind: Call variants from F1 hybrids and phase it. Actually, F1 hybrids has a lots of hets sites and is able to cover phasing quite about more than half of the gene length for most of genes. These haplotypes looks quite good quality (I mean no ambiguity, and were verified by observation on GUI viewer). - First I want to extend the haplotype using several F1 hybrids samples using different F1 samples as back bone for the one that that is being extended.

Next, I want to do ReadBack phasing on each individuals population genome sequence data. Then supplement the haplotype from F1 hyrbids (as well as partially phased population data) as a back bone to extend the haplotype phase in each individual population genome data.

Thanks much in advance !

vcf SNP genome haplotype phasing genotype phasing • 3.4k views

ADD COMMENT • link 8.3 years ago by kirannbishwa01 ★ 1.6k

score 3 · Accepted Answer · 2017-04-04

After a while I was able to write a python script, which takes in the partially phased (readback phased) allele and extend the phase states in F1 hybrids using allele frequency of the population or partially phased data of the individuals from different populations.

See the link: https://github.com/everestial/pHASE-Stitcher