Phasing variants called from FreeBayes
1
0
Entering edit mode
9.6 years ago

Hello, I am working on a dataset of multiple polyploid individuals, all with the same ploidy (= 3). I am interested in obtaining phase information for the variants called from the sequence data.

I am just starting to work with FreeBayes after trying to use GATK UnifiedGenotyper and phasing with other programs (polyHap and HapCompass) which gave less than satisfactory results.

It is my understanding that FreeBayes uses haplotype information to make variant calls from a set of individuals. Is it possible to retrieve phase information of these variant calls? I tried running FreeBayes, but the resulting solution VCF has unphased genotypes in the GT field ("/" instead of "|").

I have a list of possible haplotypes that I could input into the analysis if needed. The following two posts would suggest phasing of multisomic variants from FreeBayes is possible:

Tetraploid Snp Calling & Snp Filtering

Haplotype Calling

Any help or advice would be greatly appreciated.

Kind Regards,

- Stefano

SNP genome sequence • 6.5k views
ADD COMMENT
0
Entering edit mode
9.6 years ago

Freebayes can make a multi-nucleotide-polymorphism if you set it to allow complex variants, then for variants within a short window it will align them into haplotypes. It's quite slow to work out the most likely set, so restrict your search to windows of suspected variation. Variants that are more than your read-length away from one another of course cannot be phased, as the intervening sequence is ambiguously homozygous.

ADD COMMENT
0
Entering edit mode

Many thanks for your answer. What if I were to include known haplotypes from the population sample obtained with Beagle? Would FreeBayes phase variants based on the known haplotypes in that case?

ADD REPLY
0
Entering edit mode

You can provide FB a set of alleles you expect to be present, and it will restrict the search to those bases. I don't know if Beagle will be precise in this context. What has worked for me is a few passes of FB with varied parameters. First let it detect all single base variants, filter those a little, then use them as the basis for the haplotype reconstruction in a second round of FB. You can iterate and build up a longer and longer haplotype as what they call a MNP, just make sure you trust the basic alleles they get constructed from. This way you get the best runtime (because if you start off searching for any possible 30-mers it is extremely slow).

ADD REPLY
0
Entering edit mode

Thank you, so just to clarify, would FB be able to do this for the whole chromosome? If I have say two or more known putative haplotypes for chromosome X, what format do I input this information in the program with vcfallelicprimitives to phase trisomic GT calls? Or is there a different command to do this.

Also do MNPs refer to multi-allelic calls? If so, would FB simply ignore biallelic calls where there is only one ALT and one REF?

Apologies but I'm only just starting with FreeBayes.

ADD REPLY
0
Entering edit mode

Just found these posts explaining in detail what MNPs are:

https://groups.google.com/forum/#!topic/freebayes/UarvEtp8NX0.

https://github.com/ekg/freebayes/issues/19

I think I understand how to proceed. Still, it would be great if you could feed known haplotypes to the program?

ADD REPLY

Login before adding your answer.

Traffic: 2577 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6