I wanted to make a survival plot showing between low and high expression samples of a gene. I followed this cutpoint using maxstat package to divide samples into low and high. In that tutorial they used rsem normalised counts gene expression data.
raw counts from featurecounts package. Along with that I also have
rpkm data also.
First I used
rpkm data and plotted the survival and it looks like this:
This showed p-value = 0.026.
Secondly, I used normalized counts [converted counts to normalised counts using Deseq2] and plotted the survival and it looks like this: I see the p-value = 0.1
Both plots have same pattern, there is no change at all but why the p-values are totally different? When I used
rpkm I see that it is significant and when I used normalized counts it is not significant. What could be the reason?
Which units of gene expression data I should use to divide samples into low and high?