Question: Survival plot between low and high expression of gene
0
gravatar for Biologist
10 months ago by
Biologist150
Biologist150 wrote:

Hi,

I wanted to make a survival plot showing between low and high expression samples of a gene. I followed this cutpoint using maxstat package to divide samples into low and high. In that tutorial they used rsem normalised counts gene expression data.

I have raw counts from featurecounts package. Along with that I also have rpkm data also.

First I used rpkm data and plotted the survival and it looks like this: survival plot b/w low and high with rpkm expression This showed p-value = 0.026.

Secondly, I used normalized counts [converted counts to normalised counts using Deseq2] and plotted the survival and it looks like this: survival plot b/w low and high with normalised counts I see the p-value = 0.1

Both plots have same pattern, there is no change at all but why the p-values are totally different? When I used rpkm I see that it is significant and when I used normalized counts it is not significant. What could be the reason?

Which units of gene expression data I should use to divide samples into low and high?

ADD COMMENTlink modified 10 months ago by Santosh Anand4.8k • written 10 months ago by Biologist150
2
gravatar for Devon Ryan
10 months ago by
Devon Ryan90k
Freiburg, Germany
Devon Ryan90k wrote:

But there is a very important difference between the plots, namely the "low" values in the bottom plot are MUCH closer to the "high" values. This is why there's a difference in the P-values. You can see this in the "Strata" plot, where there's a constant difference of 1 between the top and bottom set of plots.

ADD COMMENTlink written 10 months ago by Devon Ryan90k

Oh yes. thank you. What could be the reason for that? because of different expression data?

And what would you recommend to use for dividing samples into low and high based on expression, normalized counts or rpkm? or fpkm or any other?

ADD REPLYlink modified 10 months ago • written 10 months ago by Biologist150
1

For a single gene it won't matter, unless you have isoform switching or something like that. If your gene-level metric is a summary of transcript-level metrics then TPM is going to be the most useful.

ADD REPLYlink written 10 months ago by Devon Ryan90k

Hi Devon,

Small doubt. TPM converted from raw feature counts can be used for this Analysis? I used the following function to convert.

tpm <- function(counts, lengths) {
  rate <- counts / lengths
  rate / sum(rate) * 1e6
}
ADD REPLYlink written 9 months ago by Biologist150
1

That looks right at least.

ADD REPLYlink written 9 months ago by Devon Ryan90k
0
gravatar for Santosh Anand
10 months ago by
Santosh Anand4.8k
Santosh Anand4.8k wrote:

The curves look slightly different because the maxstat algorithm in the first case assigns 18 samples in the low group, but in the 2nd case, there are 20 samples. This means that the fraction of samples surviving in the second group would be higher at most of the event points, which makes the blue curve in the 2nd group to move a little bit up and come closer to the yellow => low p-value.

And what would you recommend to use for dividing samples into low and high based on expression, normalized counts or rpkm? or fpkm or any other?

If your choice of count-algorithm gives different results, then the right Q to ask is if the results are robust. And according to me, they are not. Also, there is not enough power because maybe you are taking low/high as a thin boundary line, which is blurring the distinction between the two. You may try categorizing something like low|medium|high and check if the results are robust for the low vs high group by all of the count methods. Robustness is more important than any particular method because all of them are essentially measuring the same thing.

ADD COMMENTlink modified 10 months ago • written 10 months ago by Santosh Anand4.8k

Thank you. I will do that with normalized counts. And do you think using rpkm for the cutpoint a bad idea?

ADD REPLYlink written 10 months ago by Biologist150
1

As I said above, rpkm and normalized counts are measuring the same things - but in a different way. So if your choice of counts changes the result, you may dig deeper why it is happening by looking which samples are changing from high -> low group and why. There is no universal answer if rpkm is better vs normalized count. You have to get your hands dirty!

ADD REPLYlink written 10 months ago by Santosh Anand4.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1472 users visited in the last hour