Question: ShapeIT output File Allele Info
gravatar for prmshakya
8 weeks ago by
prmshakya0 wrote:


I recently used SHAPEIT3 to phase some data. This outputs the phased information in haps/sample format. I want to convert this to VCF format, which can be done using SHAPEIT2 but I was curious about how it differentiates between the reference and alternate allele from the haps/sample format. The hap file after phasing has the following layout :

7 SNP1 123 A G 0 0 1 0 0 0 1 1
7 SNP2 456 T C 0 1 1 0 0 1 0 1
7 SNP3 789 A T 0 1 1 0 1 1 1 1

The SHAPEIT2 documentation mentions the following:

This file is SPACE delimited. Each line corresponds to a single SNP. The first five columns are:

1)Chromosome number [integer]
2) SNP ID [string]
3) SNP Position [integer]
4) First allele [string]
5) Second allele [string]
Then the successive column pair (6, 7), (8, 9), (10, 11) and (12, 13) corresponds to the alleles carried at the 4 SNPs by each haplotype of a single individual. For example a pair "1 0" means that the first haplotype carries the B allele while the second carries the A allele. The haplotypes are given in the same order than in the SAMPLE file. This file should have L lines and 2N+5 columns, where L and N are the numbers of SNPs and individuals respectively.

There's no information on ALT/REF allele( also i'm using SHAPEIT2 documentation since SHAPEIT3 is very poorly documented and the authors claim it's highly similar to SHAPEIT2). Since, I didn't use any reference panel in the phasing process, does it matter which of the alleles in the hap format is assigned as REF/ALT in the conversion process ?

ADD COMMENTlink written 8 weeks ago by prmshakya0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2591 users visited in the last hour