I am processing the VCF files from thousands of samples from exome-sequencing.
I would like to take the three steps as follows:
- Identify the homozygous SNPs.
- Identify the minor allele SNPs.
- identify the SNP/InDel that have deleterious effects on protein function such as nonsynonymous mutations, frameshift, indels et. al. Ideally to find out all the SNPs and InDels that cause the change of protein sequences.
Is there any well-established software or tool or guideline to achieve the three goals one by one?