While doing a variant assessment for patients exomes (~16 vcf files - 16 patients), I find that some variants predicted to be pathogenic, however, the phenotype is not associated with these variants.
I also found that some pathogenic predicted variants do exist in more than a patient! so I'm thinking of doing an intersection of all the vcf files and use file containing the common variants in my variant assessment workflow!
- Is this approach OK?
- I want to build up a database for the common variants in our population, will this strategy help?
- What are the recommended tools?
Appreciated to your usual help!
A general word of caution before embarking on a
bcftools norm -mto split multi-allelics and if possible, left-align and normalize indels. Saves you a lot of headache downstream when dealing with partial overlaps with multi-allelic variants and "indels" at tandem repeat loci.
Thank you RamRS. I will consider it in the coming analyses
Thank you Emeline for your reply. I used vcftools for that but now I will use BCFTools and compare both outputs.