Hi there,
I'd be grateful for some help please.
I've got several joint-called trio VCFs (unaffected parents and proband) which I'm analysing. The multi-sample vcf nicely shows 0/0, 0/1, 1/1 for each individual per variant. However, I'd like to convert the notation to three columns of simply het, hom and ref to make it much easier to analyse.
I've checked out the annovar script convert2annovar.pl --withzyg, however this only works for individual sample vcfs which isn't what I need. I'd like all the samples together in the same multi-sample vcf but with zygosity stated for each variant.
Any help or suggestions would be greatly appreciated. E
sounds like a xyz problem. What are you trying to do?
Hi Pierre, I’m trying to get an annotated annovar file that includes all samples (mum, dad, and proband) but instead of having the genotypes as 1/1, 1/0, 0/0 etc, I would like 3 columns for mum, dad, and proband in the same file but with het, hom or ref.
Currently annovar allows you to print zygosity with (—withzyg) however, this only works for the first sample in the multi vcf, unless you include all samples in which case you end up with 3 separate files. I do not want 3 separate annotated files, I’d like everything together.
The idea is that in the final annotated file, I can quickly and easily filter each individual by het/hom/ref etc to suit the pattern of inheritance I’m expecting.
Hope that helps!
so that's a XYZ problem.
What is this pattern of inheritance ?
I want to be able to filter for all! We don’t know. It could be de novo, dominant and with incomplete penetrance, recessive, X linked etc. The idea is that I can filter for all inheritance patterns if I have the ref/het/hom for all individuals simply laid out in one single annotated file.
so it means that you only need the correct expression for bcftools view http://samtools.github.io/bcftools/bcftools.html#expressions or gatk select variants https://gatk.broadinstitute.org/hc/en-us/articles/360035891011-JEXL-filtering-expressions . There is no need to convert the genotypes.
Hi Pierre,
Thanks for your response. The problem is that this doesn’t work for the student I’m supervising who can’t use the command line. They need to work on a flat csv file such as in excel to filter on inheritance patterns. We need a multi-sample Annovar output that includes all variants as het/hom/ref instead of as usual GT output from a vcf.
laughing emoji
I can use R, but my student can't!