Question: GSVA kernels: Gaussian or Poisson?
gravatar for psm
6 weeks ago by
psm40 wrote:

Hi all, this question has indirectly come up several times. What is the best kcdf setting to use for GSVA analysis on non-log or non-variance normalized TPM data?

For GSVA analysis using RNAseq data, the GSVA manual states:

"We calculate now GSVA enrichment scores for these gene sets using first the microarray data and then the RNA-seq integer count data. Note that the only requirement to do the latter is to set the argument kcdf="Poisson" which is "Gaussian" by default.Note, however, that if our RNA-seq derived expression levels would be continous, such as log-CPMs, log-RPKMs or log-TPMs, the the default value of the kcdf argument should remain unchanged.

I assume that non-variance normalized TPM data should be treated by using the "Poisson" argument. However, following length normalization, most TPM data ends up as non-integer. I realize that this is the result of a linear transformation so the underlying structure of the data is unchanged, but according to the manual, it appears to be implied that the Gaussian setting may be appropriate for non-integer data, which includes non-variance normalized TPM.

I clearly don't understand the nuances of this setting, but wondering what other people's thoughts/suggestions/explanations are on this topic. For now, I'm just performing log1p on my TPM data and using the Gaussian argument, which runs much faster.

rna-seq • 85 views
ADD COMMENTlink modified 5 weeks ago by Kevin Blighe63k • written 6 weeks ago by psm40
gravatar for Kevin Blighe
5 weeks ago by
Kevin Blighe63k
Kevin Blighe63k wrote:

It really does just depend on the distribution of the input data, which you already appear to understand. The default of Gaussian is set thus due to the fact that most downstream datasets that are using GSVA will have already been normalised and transformed to a Gaussian. If I were using FPKM, RPKM, or just 'normalised' RNA-seq counts, on the other hand, I would use Poisson.

So, check via histogram and other summary metrics to verify the distribution on which your data is measured.


ADD COMMENTlink written 5 weeks ago by Kevin Blighe63k

Thank you Kevin - appreciate the response.

ADD REPLYlink written 25 days ago by psm40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 903 users visited in the last hour