I have a callset from whole-genome data and with this callset, I want to transform it into exome callset by extracting the variants using a exome target interval. I obtained two exome target list, one from 1KG project (Phase 3) and the Twist Exome target (https://www.twistbioscience.com/resources/bed-file/ngs-human-core-exome-panel-bed-files). I have some questions:
- Are these exome intervals appropriate to get the exon variants? If so, which one should I use? I subset my VCF file with both lists and for 1KG I got 68463 variants while with Twist Exome target, I got 21082.
- Looking at the annotations for the subset (regardless if it was with 1KG or Twist), I get variants annotated as introns even if they are tagged as protein coding transcripts. Does this makes sense for an exome target list?
Thank you very much for your help!