Cancer signature version 3
2
1
Entering edit mode
17 months ago
A ★ 4.0k

Hi

I know by

sp_url <- paste("https://cancer.sanger.ac.uk/cancergenome/assets/",
+                 "signatures_probabilities.txt", sep = "")


implemented in MutationalPattern R package

I can download mutational signatures from the COSMIC website but version 2 (30 signatures)

Anyone knows where I can obtain these probabilities for version 3 (60 signatures)

mutation vcf genome COSMIC • 613 views
1
Entering edit mode
sigs_db <- readRDS(file = system.file("extdata", "SBS_signatures.RDs",
package = "maftools", mustWork = TRUE
))

2
Entering edit mode
17 months ago
venu 6.9k

Here you can download SBS, DBS and ID probabilities for PCAWG reference signatures. Make sure which one are you using, there is one from SigProfiler and the other from SignatureAnalyzer, two methods used in original publication.

0
Entering edit mode

Thank you @venu

By making a 96 trinucleodide mutation count matrix we can extract denovo or identify existing signatures (fitting) at Single Base Substitution (SBS)

You know how I can make DBS count matrix?

If I am not wrong that should be a matrix with 78 rows based on the link you provided

sigProfiler_DBS_signatures.csv


To use this we firstly need Double Base Substituion (DBS)

The similar question could arise for INDEL matrix

I have a lot of tools extracting 96 SBS matrix but I am not finding a clue to extract such matrices for DBS or INDEL matrices

1
Entering edit mode

Alexandrov's lab has a tool SigProfilerMatrixGenerator that generates SBS, DBS and ID matrices.

0
Entering edit mode

Thank you

After extracting some denovo signatures, how you treat them for your purpose? Do you fit them against know signatures like COSMIC to see what they are?

0
Entering edit mode

I usually extract de novo signatures and compare them with COSMIC signatures to check if we get any known signatures, usually by cosine similarity. If there are new signatures, you need spend a lot of time interpreting them if they are really new or just a noise from variant calling algorithm you used/sequencing errors.

If you fit directly to known signatures, you might miss signatures that are specific to your cohort or to specific phenotype. Having said that, in order to extract meaningful de novo signatures you need 100s of samples.

0
Entering edit mode

Sorry I had asked Alexandrov lab how I know what the denovo extracted signatures are

They have replied as below

We don't have etiologies for the de_novo signatures. You are the person who will find out the etiologies and publish papers :-)

Do they mean as you mentioned by using COSMIC I find out what the denovo signatures are?

I'm not sure how to find the etiology of denovo signatures

1
Entering edit mode

They are 101% right.

Here is what they mean, if any of your extracted de novo signatures do not match existing COSMIC signatures, either it's a new signature or an artefact. If it's a new signature, you basically discovered a mutational process that was not identified before that is very specific to your cancer type. However, you need to provide sufficient evidence why you think it's a new signature, for example, do most (if not all) samples which show new signature activity have a common mutation? If so, what's the role of that gene in cancer initiation/progression? If you knock out this gene and do a signature analysis, did you loose that signature activity...I can go on, but I guess you got the point.

1
Entering edit mode
17 months ago
ATpoint 54k

Signatures are available from the official website. I doubt that example data from an R package serve as a truely reliable source. Maybe this is only a subset or somehow manipulated. You have no guarantee what this actually is. Official repositories are always recommended.

https://cancer.sanger.ac.uk/cosmic/signatures