Question: Identifying de novo variants in a WGS Trio.vcf file
0
gravatar for ClkElf
16 months ago by
ClkElf20
Istanbul Technical University, Turkey
ClkElf20 wrote:

Hello all,

I am a beginner in this field and trying to reveal de novo variants in a trio vcf file (Parents-unaffected, child-affected). First, I used PhaseByTransmission tool and then formed a new .vcf file consisting of only unphased variants ("/" instead of "|"). To my knowledge, de novo variants cannot be phased by the tool because they are not transmitted from the parents.

Here is my question, what are the next steps to identify more accurate de novo variants? Because there is almost 500k unphased variants in the final vcf file and I think it is not possible that all the candidates are true de novo variants.

Thank you so much for your helps!

Best regards,

ADD COMMENTlink written 16 months ago by ClkElf20

unless I'm wrong, de novo variants are not related to the phasing information.

ADD REPLYlink written 16 months ago by Pierre Lindenbaum125k

To my knowledge, de novo variants cannot be phased by the tool because they are not transmitted from the parents.

True, but you could just filter for variants which are found in the child and not in the parents. Phasing might work to filter a bit, but why would you?

Because there is almost 500k unphased variants in the final vcf file and I think it is not possible that all the candidates are true de novo variants.

There are of course also other reasons why a variant didn't get phased. So, no, these are not all de novo.

ADD REPLYlink modified 16 months ago • written 16 months ago by WouterDeCoster42k

So basically you mean that I should filter 0/1 or 1/1 in child, 0/0 for both parents. I also applied the Genotype Refinement Workflow of GATK on the .vcf file and the output vcf file consists of only 0/1 or 1/1 for child and hom ref for both parents as expected. But what should I do for other possibilities, for instance 1/1 for child, 0/1 for mother and 0/0 for father? Is there any specific term to call this kind of situation?

ADD REPLYlink written 16 months ago by ClkElf20

Essentially, you want to filter out lines in which the number of alternative alleles is higher in the child than the sum of the alternative alleles in the parents.

But I'd say that a scenario where you have 1/0 and 0/0 parents and a 1/1 child is extremely unlikely. Also, you are probably looking for highly penetrant mutations, which you would expect to be heterozygous.

ADD REPLYlink written 16 months ago by WouterDeCoster42k

You could try to identify genotypes that violates mendelian rules.

If you have a multisample vcf file, this can be done quite easily with bcftools.

$ bcftools +mendelian input.vcf -t mothers_sample_id,fathers_sample_id,childs_sample_id -l x > output.vcf

fin swimmer

ADD REPLYlink written 16 months ago by finswimmer13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 815 users visited in the last hour