Question: Haplotype Phasing with Shapeit2
gravatar for SOHAIL
4.4 years ago by
Beijing Institute of Genomics, CAS.
SOHAIL330 wrote:

Hi everyone,

I am anew in haplotype phasing. I have 14 WGS individuals data from one human population and i want to perform Haplotype phasing by utilizing Shapeit2. In 1000 Genomes paper supplementary information it is mentioned phasing performed in two steps:

  1. Creation of Haplotype scaffolds from microarray genotypes
  2. Joint phasing of biallelic SNPs, Indels and high-confidence deletions onto the haplotype scaffold.

My questions are:

  1. I have VCF files containing both SNPs and INDELs called by standard GATK pipeline as starting point. I dont have genotype array data for sequenced individuals, What latest haplotype reference panel i should utilize in order to perform phasing?

  2. Is that haplotype Phasing is population specific (I.e. different population individuals can have different haplotype structure according to their respective population)? I mean can variant sets from different sets of individuals from different populations be phased together?

  3. Is there any comprehensive tutorial available online that details utility of shapeit2 starting from VCF files step-by-step?

I will be very happy to read any suggestions as starting point regarding "how to perform haplotype phasing with shapeit?".

Thanks in advance!

ngs • 8.3k views
ADD COMMENTlink modified 2.9 years ago by Angel0 • written 4.4 years ago by SOHAIL330

I am also interested in your question 2 about whether we should phase population together or separately. This is the information I found (see figure 4):

Note: I haven't confirmed whether the findings are applicable to other phasing algorithms.

ADD REPLYlink written 2.9 years ago by Angel0
gravatar for Shab86
4.4 years ago by
Shab86270 wrote:

The approach to 1KG phasing is a bit different than how everyone else does phasing. The 1KG aproach is to leverage both the genotyped samples with the low coverage sequencing on the same to create the haplotype scaffolds. However, you don't need to follow the same approach as you dont have the geno/seq samples.

1). The simplest approach is thus to use SHAPEIT2 to phase your dataset without using a reference panel: What is needed here is the human genetic map (HapMap phase II b37 for example):

On the other hand, if you want to use a reference panel you can use the 1KG one or the HapMap one: They are work quite well on admixed population data and provide quite accurate phased haplotypes.

2). This can be good starting point:

3). Tutorial is available from the Shapeit2 website itself which I believe is quite detailed: One thing to start off is convert VCF to PLINK format using plink --recode-vcf option: Once you have that the shapeit website is more than enough for you to begin phasing.

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by Shab86270

Thank you very much @Shab86 for explaining everything quite comprehensively. i have few more questions:

  1. About Reference panel (that you suggested above): The human genetic map (HapMap phase II b37): The 1KGP reference panel:

looks pretty older ones (especially reference panel) but not latest. Could you please suggest any recent genetic map and reference panel files that can be used in SHAPEIT2? I read about the recent one here:

What do you think?

  1. Secondly, I read in SHAPEIT tutorial that VCF files can also be used as input file format, how you see such scenario:

shapeit --input-vcf gwas.vcf \ -M genetic_map.txt \ -O gwas.phased or you can also look at the read-ware phasing as well.

  1. Again, the reference panel will contain haplotypes for multiple populations individuals, Should i use only a set group of individuals from one continent "(for instance individuals from 1KGP of group "EUR") as reference or should i use the complete reference panel file with all the data set?

  2. any knowledge about "--no-mcmc" option used in SHAPEIT. i mean when it should be used and avoided?

Thank you very much!

ADD REPLYlink written 4.4 years ago by SOHAIL330

Yeah sure, go ahead with the Impute2 website's 1KG 2014 ref panel. And also, use the hg38 genetic map, that's UCSC's recent build I believe. Yeah, sure use VCF files for shapeit input, there's no difference in the output if you use plink or vcf.

Lastly, it depends on which population you have. The hapmap/1kg ref panels are quite diverse and represent european origin populations quite well. However, if you have say really isolated ones, it might not be a good idea to have an admixed ref panel. So, where is the sample from?

Also, don't use --no-mcmc option as its to be used only for very, very small samples and though it speeds up phasing but increases haplotype estimation errors !!!

ADD REPLYlink written 4.4 years ago by Shab86270
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1749 users visited in the last hour