Identifying de novo variants in a WGS Trio.vcf file
0
2
Entering edit mode
5.6 years ago
ClkElf ▴ 50

Hello all,

I am a beginner in this field and trying to reveal de novo variants in a trio vcf file (Parents-unaffected, child-affected). First, I used PhaseByTransmission tool and then formed a new .vcf file consisting of only unphased variants ("/" instead of "|"). To my knowledge, de novo variants cannot be phased by the tool because they are not transmitted from the parents.

Here is my question, what are the next steps to identify more accurate de novo variants? Because there is almost 500k unphased variants in the final vcf file and I think it is not possible that all the candidates are true de novo variants.

Thank you so much for your helps!

Best regards,

de novo GATK unphased multisample vcf • 2.7k views
ADD COMMENT
0
Entering edit mode

unless I'm wrong, de novo variants are not related to the phasing information.

ADD REPLY
0
Entering edit mode

To my knowledge, de novo variants cannot be phased by the tool because they are not transmitted from the parents.

True, but you could just filter for variants which are found in the child and not in the parents. Phasing might work to filter a bit, but why would you?

Because there is almost 500k unphased variants in the final vcf file and I think it is not possible that all the candidates are true de novo variants.

There are of course also other reasons why a variant didn't get phased. So, no, these are not all de novo.

ADD REPLY
0
Entering edit mode

So basically you mean that I should filter 0/1 or 1/1 in child, 0/0 for both parents. I also applied the Genotype Refinement Workflow of GATK on the .vcf file and the output vcf file consists of only 0/1 or 1/1 for child and hom ref for both parents as expected. But what should I do for other possibilities, for instance 1/1 for child, 0/1 for mother and 0/0 for father? Is there any specific term to call this kind of situation?

ADD REPLY
0
Entering edit mode

Essentially, you want to filter out lines in which the number of alternative alleles is higher in the child than the sum of the alternative alleles in the parents.

But I'd say that a scenario where you have 1/0 and 0/0 parents and a 1/1 child is extremely unlikely. Also, you are probably looking for highly penetrant mutations, which you would expect to be heterozygous.

ADD REPLY
0
Entering edit mode

You could try to identify genotypes that violates mendelian rules.

If you have a multisample vcf file, this can be done quite easily with bcftools.

$ bcftools +mendelian input.vcf -t mothers_sample_id,fathers_sample_id,childs_sample_id -l x > output.vcf

fin swimmer

ADD REPLY

Login before adding your answer.

Traffic: 2668 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6