Explanation of SHAPEIT result
2
2
Entering edit mode
3.6 years ago

can anyone explain this .hap file for me ? I did phasing using SHAPEIT.

5 rs79182581 521049 G A 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 rs2672030 526078 A G 0 1 1 0 0 1 1 1 1 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0
5 rs6880820 528609 T C 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0
5 rs74375013 529404 C T 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 rs75152572 540635 A G 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 rs60956992 542880 A G 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 rs13175182 543666 T C 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 0

phasing genome • 2.4k views
1
Entering edit mode

Please explain your problem sufficiently. And what do you mean by 'explain' the data (the format the results or what is wrong here)? Note that it is virtually impossible to interpret somebody else's data by only seeing a fragment out of context.

Some todo's for you:

1. Write at least about 1 paragraph to put your experiment into context
2. State the experimental setup and analysis pipeline your data is coming from
3. state what you have tried to understand your data properly
4. state properly the nature of your lack of understanding the data/discrepancy between expected and real outcome
0
Entering edit mode

explain means how to interpret this data. Format. What these 0,1 means ? corresponding to that ID

0
Entering edit mode

Have you tried to search for documentation on that format? I assume SHAPEIT comes with some sort of documentation?

0
Entering edit mode
0
Entering edit mode

Thanks alot. Now I got it. btw 5 means chromosome 5.

2
Entering edit mode
3.6 years ago

Here at the bottom HAPS file :

http://www.shapeit.fr/pages/m02_formats/hapssample.html

Then, first Biostars post : SHAPEIT output file confusion

1
Entering edit mode
3.6 years ago

HAP file The HAP file contains the haplotypes. The HAP file corresponding to the example dataset is:

0 0 1 0 0 0 1 1
0 1 1 0 0 1 0 1
0 1 1 0 1 1 1 1


This file is SPACE delimited. Each line corresponds to a single SNP. Each successive column pair (0, 1), (2, 3), (4, 5) and (6, 7) corresponds to the alleles carried at the 4 SNPs by each haplotype of a single individual. For example a pair "1 0" means that the first haplotype carries the B allele while the second carries the A allele as specified in the LEGEND file. The haplotypes are given in the same order than in the SAMPLE file. This file should have L lines and 2N columns, where L and N are the numbers of SNPs and individuals respectively.

http://www.shapeit.fr/pages/m02_formats/haplegsample.html

When we compare the documentation to your example, we might get an impression about the confusion, because the example does not correspond fully with the documentation. In other words, if your example is correct, the documentation is lacking and vice versa.

Let's look at the first row of your example there are 5 undocumented columns: I added ** and ()

**5 rs79182581 521049 G A** (0 0) (0 0) (1 0) (0 0) (0 0) (0 0) (0 0) (0 0) (0 0) (0 0) (0 0) (0 0) (0 0) (0 0)

5: possibly Chromosome
rs79182581: SNP rs-ID see: https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=79182581
521049: possibly genomic position of SNP, also indicates that annotation is based on not the latest genome build GRCh37.p13
G: Allele corresponding to 0
A: Allele corresponding to 1

0 0 := Samples are homozygous in rs79182581, having G G

1 0 := One of the samples supports heterozygous alleles  A G


layman speaking: most patients have chromosomes nr. 5 with a G at 521049 in both, while a single patient has one chromosome 5 with A and one with a G

A predicted/ phased haplotype consists of a single column, or in other words, the software predicts that all the different alleles in one column can be found on the same copy of the chromosome in each pair of chromosomes. That clear?

0
Entering edit mode

actually i am unable to understand this one. "For example a pair "1 0" means that the first haplotype carries the B allele while the second carries the A allele as specified in the LEGEND file" 1 means allele B and zero means Allele A ?

0
Entering edit mode

I have expanded my explanation, note that this is how I interpret the outcome, that does not mean it's correct.