Question: phasing de novo mutations
gravatar for peter.krawitz
4.6 years ago by
peter.krawitz40 wrote:

Hi folks, 

Let's suppose I called the variants in a parent child trio and filtered for de novo mutations in the child. I am now interested in the phase, that is, I would like to know whether the mutation originated in the male or female germ line. Under certain circumstances this is possible: I there is a read covering not only the de novo site but also a heterozygous polymorphism that can only have been transmitted from one parent, this information can be used for phasing.

Let's have a look at the following pseudo vcf file:

#chr pos child father mother
chr1 10   0/1   0/0      0/0
chr1 20   0/1   0/1      0/0

The second line can be phased without any further knowledge:
chr1 20   1|0   0/1      0|0

Now, if the first heterozygous mutation is on the same read as the second, then we know also the phase of this variant:
chr1 10   1|0   0|0      0|0
chr1 20   1|0   0/1      0|0

Vice versa, if the first heterozygous mutation and the second one are not on the same read, the de novo mutation arose in the maternal germ line:
chr1 10   0|1   0|0      0|0
chr1 20   1|0   0/1      0|0

Does anyone know about a software tool that does this kind of phasing?

Thanks a lot!


phasing denovo next-gen • 2.6k views
ADD COMMENTlink modified 4.6 years ago by John12k • written 4.6 years ago by peter.krawitz40

thx for your information.

Could you please tell me how to find de novo mutation in trio sequencing data?

thx in advance!

ADD REPLYlink written 4.5 years ago by 89759864480


I use GATK Unified genotyper do generate a multiple vcf file. Then I upload the data to GeneTalk, set the affection status and filter for dominant. 

If you need further assistance about GeneTalk, don't hesitate to contact me: peter at

ADD REPLYlink written 4.5 years ago by peter.krawitz40
gravatar for Vivek
4.6 years ago by
Vivek2.2k wrote:

They doesn't necessarily have to be on the same read, I think linkage equilibrium can be applied to infer haplotype of origin for de novo mutations within 1-5 kb of a mutation that satisfies mendelian inheritance.

The GATK has a tool for this that works on VCF files, ReadBackedPhasing.

ADD COMMENTlink written 4.6 years ago by Vivek2.2k
gravatar for peter.krawitz
4.6 years ago by
peter.krawitz40 wrote:

Hi Vivek, 

thanks for your answer! I had a look at the documentation of the ReadBackedPhasing tool from GATK. As far as I understood, all possible 2^n haplotypes are constructed, if we consider n variant positions. Although it didn't become clear to me from the parameters it sounds like these potential haplotypes will then be compared to known haplotype data bases to determine the most likeliest haplotype.

However, if this is how it works, it won't help with any de novo variant, as these variants cannot be in any haplotype data base yet. Thus the haplotype probabilities for the 2^n or 2^(n-1) possiblities should be the same no matter whether I include or exclude the variant position of the de novo mutation.

Please let me know if I misinterpreted you.



ADD COMMENTlink written 4.6 years ago by peter.krawitz40

I don't see the part about comparing to local databases anywhere in the documentation. As far as I know the tool consider all possible haplotypes in a given locus and picks the haplotype with the highest probability using the read information. So if your denovo mutation falls within a haplotype string with other heterozygous mutations that have been phased, you can assign the haplotype of origin.

Here's a bit more information and if you search their help fourms there's a bunch of useful threads with the developer clarifying some of the issues.

ADD REPLYlink written 4.6 years ago by Vivek2.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1289 users visited in the last hour