I am working on the exome sequencing data shared by https://www.nature.com/articles/sdata201610. To summarize the dataset, they sequenced exomes of cancer tissues and blood cells as matched normal of 7 different patients. I have acquired VCF files of 3 patients with the same type of cancers. Shared VCF files are already filtered. I tried to find the unique mutations to cancer and matched normal cells by taking different types of joins on VCF files.
I found there are many mutations unique to cancer cells and matched normal cells. I was expecting that matched normal cells will have very few unique mutations. Can you help me understand this behavior? Exact stats are shared below:
For patient 1/2/3:
Total common mutations (in both cancer tissue and matched normal blood cell): 75961/88110/82211
Total unique cancerous mutations (only in tissue): 15909/17694/17464
Total unique matched mutations (only in matched normal): 14825/13826/21555
These were the steps followed to compute common, unique cancerous and unique matched mutations using VCFs files of SNPs only.
- Only those SNP mutations were kept which satisfied the PASS criteria in filters. We have filtered both matched normal and cancer mutations.
- The mutations which are present in both matched normal and cancer mutations referred as common mutations above.
- The mutations which are exclusive to either matched normal or cancer are referred as unique matched and unique cancerous mutations, repectively.