Question: Why NMF for mutation signature analysis
2
gravatar for CY
8 months ago by
CY370
United States
CY370 wrote:

I have seen mutation signature analysis and they are always done using NMF. I am a bit new to this. Why people always choose NMF for such analysis? Is there an alternative for this?

I also found that NMF is usually used for mutation signature and SVD usually for expression signature. Any biological reason behind this?

mutation signature nmf • 711 views
ADD COMMENTlink modified 8 months ago • written 8 months ago by CY370
6
gravatar for Minstein
8 months ago by
Minstein100
Minstein100 wrote:

The mutational profile is naturally nonnegative. You can regard the latent k components as a combination of genes (i.e. metagene).

NMF can help you see which "parts" of genes function in which class of patients. In the case of face recognition, NMF can help you identify intuitional parts of faces, like mouths, eyes and noses.

Further, you can conveniently add regularization term to the normal NMF, in order to integrate useful information (e.g. PPI network or known relationships between patients) to the factorization process.

At last, you can try R packages including NMF or NNLM.

ADD COMMENTlink modified 8 months ago • written 8 months ago by Minstein100

I indeed read some methods using LASSO to enhance the sparsity although I am not sure about the biology behind the spasity assumption. Besides, I know that SVD is usually used for gene expression signature. Why NMF for mutation signature and SVD for expression signature? Any biological reason for this?

ADD REPLYlink written 8 months ago by CY370

My thinking is: Sparsity can help you interpret the biological meaning for the metagenes, because only a few numbers of coefficients are positive and it helps you better understand the function of that group of genes. You can think of the expression profile or mutation profile as the output of some intrinsic biological processes. Maybe, one metagene is corresponded to one or two pathways, or a subnetwork in PPI network or gene regulation network.

There are other methods except for LASSO to deal with sparsity, e.g. L0-norm, and also exist sparse version for PCA and SVD.

I think NMF is not limited to mutation signature and it can surely well function in gene expression analysis. For example, a classic paper introduced NMF to analyzing gene expression matrix: Metagenes and molecular pattern discovery using matrix factorization (https://www.ncbi.nlm.nih.gov/pubmed/15016911) and also a recently published paper: Integrative analysis of single-cell genomics data by coupled nonnegative matrix factorizations (https://www.ncbi.nlm.nih.gov/pubmed/29987051).

In addition, the latent components generated by NMF are not required to be orthogonal, and this is different from PCA and SVD. ICA (independent component analysis) is, to some extent, similar to NMF and you can have a look at this.

ADD REPLYlink written 8 months ago by Minstein100

Thanks for explaining. I had some digging. Some methods explain the use of sparse solution is that most mutagens are highly specific in the type of damage they cause, and therefore the majority of somatic mutational signatures are sparse.

ADD REPLYlink written 8 months ago by CY370

Thanks. It makes sense now.

ADD REPLYlink written 8 months ago by Minstein100
2
gravatar for Dawe
8 months ago by
Dawe270
Milan
Dawe270 wrote:

Basically any blind sourc separation method should work. The reason people use NMF is probably because it is simple and effective. BTW, you may find criticism and different flavors here

https://www.biorxiv.org/content/early/2018/08/04/384834

d

ADD COMMENTlink written 8 months ago by Dawe270

Thx. I read the ariticle. Why such method emphasize sparsity (even using LASSO to enhance it)? What is the biology behind this assumption?

ADD REPLYlink written 8 months ago by CY370
2
gravatar for dariober
8 months ago by
dariober10k
WCIP | Glasgow | UK
dariober10k wrote:

Sort of complementing @Minstein's answer, there is a nice visual comparison of NMF, PCA and k-means clustering in figure 14.33 (page 555 and paragraphs around it) of Elements of Statistical Learning (pdf is freely available). You should be able to transpose the message to more bioinformatics questions.

ADD COMMENTlink written 8 months ago by dariober10k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1559 users visited in the last hour