I have 30 annotated VCF files (for each chromosome) in snpEff. I would like to do summary statistics and compare number of differents effect between chromosomes (or between autosomes and sex chromosomes). I have extracted ANN field from my VCF file (using short BASH script). And it is looks like this:
Chr 10 3_prime_UTR_variant 8691 5_prime_UTR_premature_start_codon_gain_variant 517 5_prime_UTR_variant 2904 bidirectional_gene_fusion 2 conservative_inframe_deletion 17 conservative_inframe_insertion 27 conservative_inframe_insertion&splice_region_variant 1 disruptive_inframe_deletion 55 disruptive_inframe_insertion 27 non_coding_transcript_exon_variant 928 non_coding_transcript_variant 113 splice_acceptor_variant&conservative_inframe_deletion&splice_region_variant&intron_variant 2 splice_acceptor_variant&disruptive_inframe_deletion&splice_region_variant&intron_variant 2 ... start_lost 9 stop_gained&conservative_inframe_insertion 1 stop_gained&disruptive_inframe_deletion 1 stop_gained&disruptive_inframe_insertion 1 stop_gained 108 stop_lost 15 stop_lost&splice_region_variant 3 stop_retained_variant 9 synonymous_variant 5038 upstream_gene_variant 98805
There are many types of variants (and also many single variants). I would like to group that variants and compare numbers of coding, introns, flanking sequences etc.
How can I do it? How can I group other variants.
Thank you in advance.