I have subsetted SNVs based on if they are located within TF binding proximity and did a mutation signature analysis. But then I realised that since we are subsetting specific regions on the genome, we are introducing a bias towards genomic sequences. Therefore, we expect to see specific mutation type is getting enriched.

My questions are;

1) Is it valid to do mutation signature analysis for subsetted regions ?

2) How can I prove base composition is not changed within these regions as well so my signature is real. ( I have applied chi-square test of independence to see if they are independent or not. I got p-value < 2.2e-16, which say they are independent. )

Thank you for your help and time,

Its sunday here so have a great sunday!


