I am currently working with TCGA MAF and TPM files.
Given that TCGA files represent patients and not cell lines, the records contain not only tumor cells but immune cells, fibroblasts, etc. I found a table that contains tumor purity for the samples, and on average it’s around 0.7, but the values are very scattered.
I want to calculate Variant Allele Frequency (VAF) in MAF files, but the non-tumor cells can significantly alter the values.
Example: If there are 50 tumor cells, and 50 other cells, the calculated VAF will be 0.5, whereas, in reality, it is 1.
Also, I’d like to calculate fold change in TPM files, but the non-tumor cells can also cause false values.
My question is, can I somehow filter out non-tumor-related records from MAF and TPM files?