Hi,
I recently used SHAPEIT3 to phase some data. This outputs the phased information in haps/sample format. I want to convert this to VCF format, which can be done using SHAPEIT2 but I was curious about how it differentiates between the reference and alternate allele from the haps/sample format. The hap file after phasing has the following layout :
7 SNP1 123 A G 0 0 1 0 0 0 1 1
7 SNP2 456 T C 0 1 1 0 0 1 0 1
7 SNP3 789 A T 0 1 1 0 1 1 1 1
The SHAPEIT2 documentation mentions the following:
This file is SPACE delimited. Each line corresponds to a single SNP. The first five columns are:
1)Chromosome number [integer]
2) SNP ID [string]
3) SNP Position [integer]
4) First allele [string]
5) Second allele [string]
Then the successive column pair (6, 7), (8, 9), (10, 11) and (12, 13) corresponds to the alleles carried at the 4 SNPs by each haplotype of a single individual. For example a pair "1 0" means that the first haplotype carries the B allele while the second carries the A allele. The haplotypes are given in the same order than in the SAMPLE file. This file should have L lines and 2N+5 columns, where L and N are the numbers of SNPs and individuals respectively.
There's no information on ALT/REF allele( also i'm using SHAPEIT2 documentation since SHAPEIT3 is very poorly documented and the authors claim it's highly similar to SHAPEIT2). Since, I didn't use any reference panel in the phasing process, does it matter which of the alleles in the hap format is assigned as REF/ALT in the conversion process ?