Question

RNA-seq deconvolution algorithm that takes into account weight of marker gene

0

Entering edit mode

3 months ago

CTLong ▴ 110

Hi all,

Some deconvolution algorithms on RNA-seq data requires a list of marker genes as reference for different cell types. However, does anyone know whether any of these have the option to specify the "importance" or weight of a marker gene when the deconvolution is performed, so that the more important markers will have greater impact on the calculation?

RNA-seq • 490 views

ADD COMMENT • link updated 3 months ago by Shred ★ 1.4k • written 3 months ago by CTLong ▴ 110

score 0 · Answer 1 · 2024-01-07

0

Entering edit mode

3 months ago

Shred ★ 1.4k

What do you mean by weight?
Most of the algorithms out there already assign different weights to each gene to do regression. For example, CIBERSORT does regression using support vectors. These algorithms' weights were assigned during the training phase.

If you're working on a particular tissue, you could figure out a way to retrain any of them on the specific scenario, but I wouldn't bet a penny on the success.

ADD COMMENT • link 3 months ago by Shred ★ 1.4k

0

Entering edit mode

What I mean by weight is for example, GeneA is expressed in a higher and more exclusive level only in CelltypeA, whereas GeneB is expressed at a lower level, but still a representative marker of CelltypeA. As a result, when I specify a list of marker genes for deconvolution, I would like GeneA to have a higher weight in the calculation than GeneB. What I'm not sure about, is whether these deconvolution algorithms considers all marker genes as "equally important", and if there is a chance to rank or specify their importance.

ADD REPLY • link 3 months ago by CTLong ▴ 110

1

Entering edit mode

Read for example the method section of CIBERSORTx paper to understand the concept behind.

For example:

Tese methods generally assume that biological mixture samples can be modeled as a system of linear equations, where a single mixture transcriptome m with n genes is represented as the product of H and f, where H represents an n×c cell type expression matrix consisting of expression profles for the same n genes across c distinct cell types, and f represents a vector of size c, consisting of cell type mixing proportions.

So, markers are taken all together. Matrices (i.e, genes across cells) are used at the same time. The weight you're referring to is a distance between an expected expression level and the observed one which, in this case, must be computed for all the genes of the list. Given that you could imagine to compute this distance within the expected expression level of every (expected) cell type, if a given gene is expressed at a high level only in a given cell type, this will dramatically increase the resulting fraction.

ADD REPLY • link 3 months ago by Shred ★ 1.4k

1

Entering edit mode

Thanks for the information. So my interpretation is, if a gene is more representative of a cell type, let's say expressed at a high level or exclusively expressed in a cell type, then this would inherently increase the resulting fraction from the deconvolution without having to account for the "weight" naturally? Whereas those less representative would increase the fraction less?