Question: Phasing variants called from FreeBayes
0
gravatar for stefano.iantorno
4.5 years ago by
United Kingdom
stefano.iantorno70 wrote:

Hello, I am working on a dataset of multiple polyploid individuals, all with the same ploidy (= 3). I am interested in obtaining phase information for the variants called from the sequence data.

I am just starting to work with FreeBayes after trying to use GATK UnifiedGenotyper and phasing with other programs (polyHap and HapCompass) which gave less than satisfactory results.

It is my understanding that FreeBayes uses haplotype information to make variant calls from a set of individuals. Is it possible to retrieve phase information of these variant calls? I tried running FreeBayes, but the resulting solution VCF has unphased genotypes in the GT field ("/" instead of "|").

I have a list of possible haplotypes that I could input into the analysis if needed. The following two posts would suggest phasing of multisomic variants from FreeBayes is possible:

Tetraploid Snp Calling & Snp Filtering

Haplotype Calling

Any help or advice would be greatly appreciated.

Kind Regards,

- Stefano

snp sequence genome • 3.8k views
ADD COMMENTlink modified 4.5 years ago • written 4.5 years ago by stefano.iantorno70
0
gravatar for karl.stamm
4.5 years ago by
karl.stamm3.4k
United States
karl.stamm3.4k wrote:

Freebayes can make a multi-nucleotide-polymorphism if you set it to allow complex variants, then for variants within a short window it will align them into haplotypes. It's quite slow to work out the most likely set, so restrict your search to windows of suspected variation. Variants that are more than your read-length away from one another of course cannot be phased, as the intervening sequence is ambiguously homozygous. 

ADD COMMENTlink written 4.5 years ago by karl.stamm3.4k

Many thanks for your answer. What if I were to include known haplotypes from the population sample obtained with Beagle? Would FreeBayes phase variants based on the known haplotypes in that case?

ADD REPLYlink written 4.5 years ago by stefano.iantorno70

You can provide FB a set of alleles you expect to be present, and it will restrict the search to those bases. I don't know if Beagle will be precise in this context. What has worked for me is a few passes of FB with varied parameters. First let it detect all single base variants, filter those a little, then use them as the basis for the haplotype reconstruction in a second round of FB. You can iterate and build up a longer and longer haplotype as what they call a MNP, just make sure you trust the basic alleles they get constructed from. This way you get the best runtime (because if you start off searching for any possible 30-mers it is extremely slow). 

ADD REPLYlink written 4.5 years ago by karl.stamm3.4k

Thank you, so just to clarify, would FB be able to do this for the whole chromosome? If I have say two or more known putative haplotypes for chromosome X, what format do I input this information in the program with vcfallelicprimitives to phase trisomic GT calls? Or is there a different command to do this.

Also do MNPs refer to multi-allelic calls? If so, would FB simply ignore biallelic calls where there is only one ALT and one REF?

Apologies but I'm only just starting with FreeBayes.

ADD REPLYlink written 4.5 years ago by stefano.iantorno70

Just found these posts explaining in detail what MNPs are:

https://groups.google.com/forum/#!topic/freebayes/UarvEtp8NX0.

https://github.com/ekg/freebayes/issues/19

I think I understand how to proceed. Still, it would be great if you could feed known haplotypes to the program?

ADD REPLYlink written 4.5 years ago by stefano.iantorno70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1756 users visited in the last hour