I am using R markdown and maftools to read the mutation file of PCAWG Breast cancer data which is in maf format. I created a mutation matrix and then used MutationalPatterns to extract the mutational signatures found in Breast cancer. Since there are 13 signatures found in breast cancer in the previous PCAWG paper, I extracted only 13 mutational signatures, general refit all those selected 13 signatures, and matched each signature to the COSMIC signature to find the same signatures found in the previous PCAWG paper.
My issue is
- There are many zero values for SBS1 and SBS5 contributions in many breast cancer samples.
- The contribution for SBS1 and SBS5 are super low decimal values like 0.004, 0.000022 for most of the breast cancer sample.
My goal is to do linear regression of SBS1 number of mutations per Gb vs age of diagnosis for each sample but since the contribution for each sample is zero or super low decimal values, I cannot use the linear regression model.
If anyone can help me, it would be great. I can share my code if required.