Question: Why NMF for mutation signature analysis
1
gravatar for CY
28 days ago by
CY270
United States
CY270 wrote:

I have seen mutation signature analysis and they are always done using NMF. I am a bit new to this. Why people always choose NMF for such analysis? Is there an alternative for this?

I also found that NMF is usually used for mutation signature and SVD usually for expression signature. Any biological reason behind this?

mutation signature nmf • 200 views
ADD COMMENTlink modified 27 days ago • written 28 days ago by CY270
4
gravatar for Minstein
27 days ago by
Minstein50
Minstein50 wrote:

The mutational profile is naturally nonnegative. You can regard the latent k components as a combination of genes (i.e. metagene).

NMF can help you see which "parts" of genes function in which class of patients. In the case of face recognition, NMF can help you identify intuitional parts of faces, like mouths, eyes and noses.

Further, you can conveniently add regularization term to the normal NMF, in order to integrate useful information (e.g. PPI network or known relationships between patients) to the factorization process.

At last, you can try R packages including NMF or NNLM.

ADD COMMENTlink modified 27 days ago • written 27 days ago by Minstein50

I indeed read some methods using LASSO to enhance the sparsity although I am not sure about the biology behind the spasity assumption. Besides, I know that SVD is usually used for gene expression signature. Why NMF for mutation signature and SVD for expression signature? Any biological reason for this?

ADD REPLYlink written 27 days ago by CY270

My thinking is: Sparsity can help you interpret the biological meaning for the metagenes, because only a few numbers of coefficients are positive and it helps you better understand the function of that group of genes. You can think of the expression profile or mutation profile as the output of some intrinsic biological processes. Maybe, one metagene is corresponded to one or two pathways, or a subnetwork in PPI network or gene regulation network.

There are other methods except for LASSO to deal with sparsity, e.g. L0-norm, and also exist sparse version for PCA and SVD.

I think NMF is not limited to mutation signature and it can surely well function in gene expression analysis. For example, a classic paper introduced NMF to analyzing gene expression matrix: Metagenes and molecular pattern discovery using matrix factorization (https://www.ncbi.nlm.nih.gov/pubmed/15016911) and also a recently published paper: Integrative analysis of single-cell genomics data by coupled nonnegative matrix factorizations (https://www.ncbi.nlm.nih.gov/pubmed/29987051).

In addition, the latent components generated by NMF are not required to be orthogonal, and this is different from PCA and SVD. ICA (independent component analysis) is, to some extent, similar to NMF and you can have a look at this.

ADD REPLYlink written 27 days ago by Minstein50

Thanks for explaining. I had some digging. Some methods explain the use of sparse solution is that most mutagens are highly specific in the type of damage they cause, and therefore the majority of somatic mutational signatures are sparse.

ADD REPLYlink written 27 days ago by CY270

Thanks. It makes sense now.

ADD REPLYlink written 26 days ago by Minstein50
2
gravatar for Dawe
28 days ago by
Dawe260
Milan
Dawe260 wrote:

Basically any blind sourc separation method should work. The reason people use NMF is probably because it is simple and effective. BTW, you may find criticism and different flavors here

https://www.biorxiv.org/content/early/2018/08/04/384834

d

ADD COMMENTlink written 28 days ago by Dawe260

Thx. I read the ariticle. Why such method emphasize sparsity (even using LASSO to enhance it)? What is the biology behind this assumption?

ADD REPLYlink written 27 days ago by CY270
2
gravatar for dariober
27 days ago by
dariober9.7k
Glasgow - UK
dariober9.7k wrote:

Sort of complementing @Minstein's answer, there is a nice visual comparison of NMF, PCA and k-means clustering in figure 14.33 (page 555 and paragraphs around it) of Elements of Statistical Learning (pdf is freely available). You should be able to transpose the message to more bioinformatics questions.

ADD COMMENTlink written 27 days ago by dariober9.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1253 users visited in the last hour