I would like to seek some advice regarding a kind of analysis I am planning to perform with my exome data. I have sequenced data from two patients of specific cancer. I have also sequenced the peripheral blood as matched control for the patients. I am also having sequence of tumor iPSCs ( which means we reprogrammed the tumor lines to its iPSC and then sequenced it). We do not have a control exome data of normal iPSC(we did not reprogram normal fibroblasts to generate normal iPSCs as control for the tumor iPSCs) here. So the somatic variants for the iPSC is being obtained from normal peripheral blood exome / iPSC derived from tumor pair. So for each patients I have 4 samples for which exome sequencing is done. 1 normal, 1 tumor and 2 iPSC lines . My idea is to find the mutational landscape that is conserved from tumor to its tumor reprogrammed clone. We are not considering the dosage effect or the number of passages at which the reprogramming is done, so clearly there might be a selective advantage of mutations due to reprogramming that might occupy the majority of the IPSC clone. We know that the tumor is polyclonal and the IPSC is a single clone so the IPSC should contain the mutation that is actually spread in highest frequency in the tumor clones (barring the fact of selective advantage and other acquired mutation due to reprogramming). Still I can expect some mutation will pass to iPSC and gain precision from the tumor and also have elevated frequency. To this I employed established variant callers to fish out somatic variants from my samples and tried to find the to what extent these somatic variants are actually conserved in the tumor iPSCs. The overlap was fairly not convincing enough and the extent is roughly 44%. Now I want to do a check of these variants across all somatic mutations that I can obtain from TCGA for all tumor types. I have not worked with MAF files from TCGA much but after some studies on posts and websites I figured out we do not have a comprehensive mutation file that catalogs somatic mutations for all cancer types. We have it at individual level for each cancer types. I am interested to see the somatic variants which I have extracted for my samples(since they are not from large cohort of samples), are they somehow significantly observed as cancer related mutations across all types of cancer and I did not obtain them by chance. This would ensure me that even the mutational burden that the iPSC has, even not an exact mimic of its tumor but still the mutations are relevant and tumorigenic. This will give me a fist hand validation on my variants. Now my question is how do I obtain such a mutation file which will be having somatic mutations across most of the cancer types which its genomic loci, gene name, read statistics to which I can try to interrogate my variant data. Can this be achieved? Shall I do it separately across different cancer types taking up the MAF files for each tumor type and interrogate my somatic variants with them? This is what I want to achieve as of now. I would like some inputs out here from people out here. If someone has some other ideas I would like to know about it as well. Which data should I be consulting for this. I am sure it should be the MAF but am a bit lost among the TCGA consortium. Any leads?
Thanks and Regards
VD