Question: Phasing vcf file with Beagle
gravatar for sixthirty
10 days ago by
sixthirty0 wrote:

Hello all,

I tried to phase my vcf files with Beagle. All the genotypes seem to be phased but the INFO field containing annotations of interest disappeared and I want to keep it. How can I do that ? In case it's not possible to keep the INFO field with Beagle, is there a way to fuse my new phased vcf file with the old one in order to add the INFO field ?

Here's the command line I used: java -Xmx2g -jar /home/Softs/beagle/beagle.27Apr20.b81.jar gt=out.vcf out=phased_beagle

Also, I didn't specify a reference panel or a map. Is it necessary and why ? I don't find their manuals/instructions clear enough.

Thank you for your help!

ADD COMMENTlink modified 10 days ago • written 10 days ago by sixthirty0

You should always specify a map (if you are working on humans). You should usuaully use a reference panel as well if you are working with human data. What's your sample size / species / number of snps?

ADD REPLYlink written 9 days ago by 4galaxy77310

Thank you for your answer. I am working on human data. My vcf file contains variants for a family of 4 individuals based on the GRCh37 genome and contains around 431000 variants.

ADD REPLYlink written 9 days ago by sixthirty0

OK - seems like you want to do trio-phasing then (i.e. using the information from the mother and father to phase the offspring). Is that right? If so, Beagle doesn't do trio phasing - you would need to use something else like WhatsHap for that.

ADD REPLYlink written 9 days ago by 4galaxy77310

I tried to phase with WhatsHap but only part of my heterozygous genotypes are phased. Also, I don't have the bam files for all the families so I am looking for a way to phase my vcf files without having to use bam files.

Are you sure that trio phasing isn't possible with Beagle ? This documentation about Beagle seems to imply that trio phasing is possible with Beagle: ( "Beagle can perform haplotype phase inference and missing data imputation using data from unrelated individuals, parent-offspring trios, parent-offspring pairs, and phase-known haplotypes."

ADD REPLYlink written 6 days ago by sixthirty0

Yes trio-phasing will only work on some heterozygous positions, not all. You will need to do statistical phasing (i.e. using a reference panel) if you want to phase all variants. That will involve downloading a reference dataset like the 1000 genomes.

why are you trying to phase them - is there something particular you are aiming to do?

ADD REPLYlink written 6 days ago by 4galaxy77310

Hi, I'm working on the same project. What we're trying to achieve is to link denovo variants with herited ones. For that we need to get haplotypes to know on which allele is the denovo, so as we understood it, phasing is necessary. Because of some network issues, we're having a hard time getting BAM files for the vcf we got but everything we searched for phasing requiers those files. Do you think of any other way? Thank you

ADD REPLYlink written 2 days ago by Maxime0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2105 users visited in the last hour