I have ~250 patient INDEL and SNV vcfs from ICGC on one side. On the other side, I have a BED file which contains specific locations. I would like to hypothesise that in some specific location the mutation frequency is higher than the other locations in the bed file.
My idea is;
1) Create a data frame where on columns, there are locations and on rows, there are patients. 2) For each overlapping mutation I will add +1 to that cell. 3) At the end I will have a unit row vector which contains total mutations for each single location.
What will be the method that I calculate significance? Fo example, if I decided to use bootstrapping, what should I randomly generate ?