I have a VCF file for a whole-exome sequence dataset generated by the agilent 1.1 capture kit.
The genome coordinates are GRCh37.
If I wanted to a case-control burden test on every gene in the dataset what steps would I need to follow?
- how do I get a complete and unique list of genes to run the test on?
- how do I subset just the variants in the exons of these genes?