3.3 years ago by
New Zealand
1) There are two common types of phasing: phasing by pedigree (by relating the alleles to the paternal or maternal origin via allele transmission), or local phasing (by determining which variants are locally in phase with each other, either by something like read backed phasing, or variant callers that directly call local haplotypes).
2) The de novo mutations can not be directly phased by pedigree, but they can be locally phased with respect to nearby variants (which may themselves be phased by pedigree).
3) You could use PhaseByTransmission, but you will obtain a better overall result if you jointly call the family using a pedigree-aware variant caller, such as rtg family
or (for larger pedigrees) rtg population
from RTG Core. This is because the pedigree-aware joint calling allows the evidence for each of the samples to influence the calls in other members of the pedigree (in rtg population
, you can even use this to impute genotypes for missing family members during calling, which gets better the more family members you have). These callers automatically phase the output variants according to the pedigree, and directly output VCF annotations indicating putative de novo variants (including a de novo specific score).
(RTG Core also includes the rtg mendelian
command which is useful for annotating VCFs for mendelian inheritance errors etc, and this command is also present in the smaller utility package RTG Tools). Disclaimer: I work for RTG :-).
I'm not sure if phasing is the best way to find de novo variants. Why don't you just look (from the vcf) for variants which are present in the child but not in the parents?
because this will produce a large number of denovos
And that's not what you want? A variant which is found in a child but not in a parent is by definition a de novo variant, no?
Yes, I think that OP is perhaps thinking that phasing is what allows you to find de novos (by looking for variants which are not phased). But this is certainly not reliable, e.g:
phasing by transmission may fail to phase germline variants if it cannot be determined which parent transmitted which allele (e.g. a site where all members of a trio are heterozygous)
read backed phasing may fail to phase germline variants if there are no nearby heterozygotes to phase with respect to.
de novo variants may be phased physically with respect to nearby pedigree phased variants, and this would let you identify which haplotype is affected (e.g. to identify compound heterozygosity)
So, phasing is useful for it's own reasons, but finding de novos is not one of them.
Thanks a lot. now it is all starting to make sense, as you said, I thought phasing is a way to find denovos. This drives me to ask, what benefit then would someone get from phasing denovos (or variants in general) ? is it just to know which haplotype (where the denovo is located) is affected ? and thus we would know if it is the father or the mother who is the reason (in a way) behind causing this denovo ? or are there other benefits.