Question: Identifying de novo variants in a WGS Trio.vcf file
0
gravatar for ClkElf
7 months ago by
ClkElf20
Istanbul Technical University, Turkey
ClkElf20 wrote:

Hello all,

I am a beginner in this field and trying to reveal de novo variants in a trio vcf file (Parents-unaffected, child-affected). First, I used PhaseByTransmission tool and then formed a new .vcf file consisting of only unphased variants ("/" instead of "|"). To my knowledge, de novo variants cannot be phased by the tool because they are not transmitted from the parents.

Here is my question, what are the next steps to identify more accurate de novo variants? Because there is almost 500k unphased variants in the final vcf file and I think it is not possible that all the candidates are true de novo variants.

Thank you so much for your helps!

Best regards,

ADD COMMENTlink written 7 months ago by ClkElf20

unless I'm wrong, de novo variants are not related to the phasing information.

ADD REPLYlink written 7 months ago by Pierre Lindenbaum119k

To my knowledge, de novo variants cannot be phased by the tool because they are not transmitted from the parents.

True, but you could just filter for variants which are found in the child and not in the parents. Phasing might work to filter a bit, but why would you?

Because there is almost 500k unphased variants in the final vcf file and I think it is not possible that all the candidates are true de novo variants.

There are of course also other reasons why a variant didn't get phased. So, no, these are not all de novo.

ADD REPLYlink modified 7 months ago • written 7 months ago by WouterDeCoster38k

So basically you mean that I should filter 0/1 or 1/1 in child, 0/0 for both parents. I also applied the Genotype Refinement Workflow of GATK on the .vcf file and the output vcf file consists of only 0/1 or 1/1 for child and hom ref for both parents as expected. But what should I do for other possibilities, for instance 1/1 for child, 0/1 for mother and 0/0 for father? Is there any specific term to call this kind of situation?

ADD REPLYlink written 7 months ago by ClkElf20

Essentially, you want to filter out lines in which the number of alternative alleles is higher in the child than the sum of the alternative alleles in the parents.

But I'd say that a scenario where you have 1/0 and 0/0 parents and a 1/1 child is extremely unlikely. Also, you are probably looking for highly penetrant mutations, which you would expect to be heterozygous.

ADD REPLYlink written 7 months ago by WouterDeCoster38k

You could try to identify genotypes that violates mendelian rules.

If you have a multisample vcf file, this can be done quite easily with bcftools.

$ bcftools +mendelian input.vcf -t mothers_sample_id,fathers_sample_id,childs_sample_id -l x > output.vcf

fin swimmer

ADD REPLYlink written 7 months ago by finswimmer11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1164 users visited in the last hour