I was wondering whether anyone is aware of any existing tools to summarise snpEffs vcf output.
I have a few hundred vcf files from snpEff and want to know how often there is, for example a high impact*, mutation in the same gene. So the output would indicate in how many of my vcf files was there a high impact mutation.
*As per snpEffs definitions. Relevant to microbes: stop_gained, stop_lost, frameshift
Thanks in advance for any help with this!
I use my tool "GroupByGene" https://github.com/lindenb/jvarkit/wiki/GroupByGene to group this kind of VCF information to a gene. All samples must be grouped in the same filtered VCF (filtered= keep "high impact" variants ).