Question: Survival plot between low and high expression of gene
0
gravatar for Biologist
17 months ago by
Biologist190
Biologist190 wrote:

Hi,

I wanted to make a survival plot showing between low and high expression samples of a gene. I followed this cutpoint using maxstat package to divide samples into low and high. In that tutorial they used rsem normalised counts gene expression data.

I have raw counts from featurecounts package. Along with that I also have rpkm data also.

First I used rpkm data and plotted the survival and it looks like this: survival plot b/w low and high with rpkm expression This showed p-value = 0.026.

Secondly, I used normalized counts [converted counts to normalised counts using Deseq2] and plotted the survival and it looks like this: survival plot b/w low and high with normalised counts I see the p-value = 0.1

Both plots have same pattern, there is no change at all but why the p-values are totally different? When I used rpkm I see that it is significant and when I used normalized counts it is not significant. What could be the reason?

Which units of gene expression data I should use to divide samples into low and high?

ADD COMMENTlink modified 17 months ago by Santosh Anand5.0k • written 17 months ago by Biologist190
2
gravatar for Devon Ryan
17 months ago by
Devon Ryan93k
Freiburg, Germany
Devon Ryan93k wrote:

But there is a very important difference between the plots, namely the "low" values in the bottom plot are MUCH closer to the "high" values. This is why there's a difference in the P-values. You can see this in the "Strata" plot, where there's a constant difference of 1 between the top and bottom set of plots.

ADD COMMENTlink written 17 months ago by Devon Ryan93k

Oh yes. thank you. What could be the reason for that? because of different expression data?

And what would you recommend to use for dividing samples into low and high based on expression, normalized counts or rpkm? or fpkm or any other?

ADD REPLYlink modified 17 months ago • written 17 months ago by Biologist190
1

For a single gene it won't matter, unless you have isoform switching or something like that. If your gene-level metric is a summary of transcript-level metrics then TPM is going to be the most useful.

ADD REPLYlink written 17 months ago by Devon Ryan93k

Hi Devon,

Small doubt. TPM converted from raw feature counts can be used for this Analysis? I used the following function to convert.

tpm <- function(counts, lengths) {
  rate <- counts / lengths
  rate / sum(rate) * 1e6
}
ADD REPLYlink written 16 months ago by Biologist190
1

That looks right at least.

ADD REPLYlink written 16 months ago by Devon Ryan93k
0
gravatar for Santosh Anand
17 months ago by
Santosh Anand5.0k
Santosh Anand5.0k wrote:

The curves look slightly different because the maxstat algorithm in the first case assigns 18 samples in the low group, but in the 2nd case, there are 20 samples. This means that the fraction of samples surviving in the second group would be higher at most of the event points, which makes the blue curve in the 2nd group to move a little bit up and come closer to the yellow => low p-value.

And what would you recommend to use for dividing samples into low and high based on expression, normalized counts or rpkm? or fpkm or any other?

If your choice of count-algorithm gives different results, then the right Q to ask is if the results are robust. And according to me, they are not. Also, there is not enough power because maybe you are taking low/high as a thin boundary line, which is blurring the distinction between the two. You may try categorizing something like low|medium|high and check if the results are robust for the low vs high group by all of the count methods. Robustness is more important than any particular method because all of them are essentially measuring the same thing.

ADD COMMENTlink modified 17 months ago • written 17 months ago by Santosh Anand5.0k

Thank you. I will do that with normalized counts. And do you think using rpkm for the cutpoint a bad idea?

ADD REPLYlink written 17 months ago by Biologist190
1

As I said above, rpkm and normalized counts are measuring the same things - but in a different way. So if your choice of counts changes the result, you may dig deeper why it is happening by looking which samples are changing from high -> low group and why. There is no universal answer if rpkm is better vs normalized count. You have to get your hands dirty!

ADD REPLYlink written 17 months ago by Santosh Anand5.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1843 users visited in the last hour