Drilling down to the right samples in COSMIC
0
0
Entering edit mode
6 weeks ago
Vincent Laufer ★ 2.4k

Hello Biostars - Thank you for all the help lately - just one more question.

If I navigate to the COSMIC page, there are files containing the ~60 SBS loadings for individual cancer patients by cancer type.These can be obtained by downloading, for instance, "SigProfilier_PCAWG_WGS_probabilities_SBS.csv", which is a flatfile.

As can be seen below, each row is one patient's mutation rate for a given trinucleotide sequence while columns are the SBS signature types, like this:

Sample  Cancer Type Mutation Type   Mutation Subtype    SBS1    SBS2    SBS3    SBS4
SP117655    Biliary-AdenoCA C>A ACA 0.0045447   2.58E-06    0   0
SP117655    Biliary-AdenoCA C>A ACC 0.022974    0.0012906   0   0
SP117655    Biliary-AdenoCA C>A ACG 0.0083704   0.002148    0   0
SP117655    Biliary-AdenoCA C>A ACT 0.012359    0.00081708  0   0
SP117655    Biliary-AdenoCA C>G ACA 0.019838    4.12E-15    0   0
SP117655    Biliary-AdenoCA C>G ACC 0.019084    0.0018116   0   0
SP117655    Biliary-AdenoCA C>G ACG 0.0069102   0.00079127  0   0
SP117655    Biliary-AdenoCA C>G ACT 0.010542    0.00072964  0   0
SP117655    Biliary-AdenoCA C>T ACA 0.12931 0.00027331  0   0
SP117655    Biliary-AdenoCA C>T ACC 0.059811    0.011297    0   0
SP117655    Biliary-AdenoCA C>T ACG 0.97484 7.57E-05    0   0
SP117655    Biliary-AdenoCA C>T ACT 0.065793    0.011098    0   0
SP117655    Biliary-AdenoCA T>A ATA 0.016473    0.0017114   0   0
SP117655    Biliary-AdenoCA T>A ATC 0.056402    0.0192  0   0
SP117655    Biliary-AdenoCA T>A ATG 0.020153    0.00039076  0   0
SP117655    Biliary-AdenoCA T>A ATT 0.0024127   5.07E-15    0   0


The Sample column corresponds to the individual patients; can readily see this patient has biliary adenocarcinoma. OK, finally, here are the questions:

1) Biliary Adenocarcinoma is a good start. But, is there any way to drill down into these samples more? For instance, what would be the quickest way to separate the ~35 biliary adenocarc patients in this file into subcategories, for instance, IDH1+, IDH2+, FGFR2-fusion+, etc. ? I feel sure this must be possible. I'd prefer an annotated metadata like file, but if need be, I could probably download the raw data itself and figure out the drivers from that.

Is anyone familiar enough with this site to know a quick way to do it?

2) I imagine this is just like adjusting for loadings of other kinds, e.g. principal components. But, I wanted to ask, are there any pitfalls or idiosyncratic differences to be aware of? Example, do I need to match for gender? Alt splicing differs between sexes in drosophila, some of these cancers have dysregulated splicing, etc., etc. Just want to not make any mistakes.

cosmic mutational PCAWG SBS tumor signature • 117 views