Question: Negative Zero Workaround in R for signed p-value sorting
0
gravatar for mbio.kyle
2.9 years ago by
mbio.kyle300
United States
mbio.kyle300 wrote:

I am working with the results of a pathway analysis experiment. I have a dataframe, rows are pathways and and columns are samples. For each sample I did RNAseq, and performed GSEA on the results. I then pulled out each pathway from GSEA results (hallmark) from the positive and negative correlation and their associated p-val. I'd like to make a heatmap of this with significant positive and significant negative on either ends and all the genes in the middle are not all that significant.

So here is what the data looks like:

NAME    signed-p-val
IL2_STAT5_SIGNALING -0.0000
INTERFERON_ALPHA_RESPONSE   -0.0055
ALLOGRAFT_REJECTION -0.0070
ESTROGEN_RESPONSE_EARLY -0.0103
MYOGENESIS  -0.0109
ANGIOGENESIS    -0.0203
APOPTOSIS   -0.0422
# I removed some but each list has the same length
# all 50 pathways from hallmark gene set
APICAL_JUNCTION -0.0428
WNT_BETA_CATENIN_SIGNALING  0.28242677
PROTEIN_SECRETION   0.28635347
HYPOXIA 0.61358315
UV_RESPONSE_UP  0.9225513
CHOLESTEROL_HOMEOSTASIS 0.92826086
TGF_BETA_SIGNALING  0.92060083
DNA_REPAIR  1

That is just a subset of the table, and I have three one for each condition. I did a signed p-value by setting the p-value for the negative enrichment pathways to negative. My issue now is if I sort the dataframe before heatmapping I get all the largely negative p-values at the top and all the largely positive p-values at the bottom. I tried using negative 0 ( -0.000 ) but it didn't work in R (as it does in python).

So I'd like to sort this thing like: -0 -> -1:1 -> 0

Here is the code I have so far. I am really an R novice, but I am guessing I am looking for a way to specify a sort function similar to how you can specify in python by defining the __cmp__ for a class etc etc.

library(pheatmap)
library(RColorBrewer)

sample1 = read.table("sample1.tsv", header=T, row.names=1, sep="\t")
sample2 = read.table("sample2.tsv", header=T, row.names=1, sep="\t")
sample3 = read.table("sample3.tsv", header=T, row.names=1, sep="\t")

merged <- merge(sample1, sample2, all=T, by="row.names")
rownames(merged) <- merged$Row.names
merged$Row.names <- NULL
merged <- merge(merged, sample3, all=T, by="row.names")
rownames(merged) <- merged$Row.names
merged$Row.names <- NULL
merged[is.na(merged)] <- 1
colnames(merged) <- c("sample1", "sample2", "sample3")


merged <- merged[order(rowSums(merged)),]
color <-  colorRampPalette(rev(brewer.pal(9, "RdBu")))(100)
pheatmap(merged, cluster_rows=F, cluster_cols=F, color = color)
statistics R • 1.1k views
ADD COMMENTlink modified 2.9 years ago by fanli.gcb650 • written 2.9 years ago by mbio.kyle300
0
gravatar for fanli.gcb
2.9 years ago by
fanli.gcb650
Los Angeles, CA
fanli.gcb650 wrote:

You can do it in R like this:

Sample data:

df <- data.frame(NAME=c("A","B","C","D"), pval=c(-0.005, 0.002, -0.9, 0.8))

Sort by absolute value of the p-value:

out <- df[order(abs(df$pval)),]

Reverse the order of the positive p-value entries:

tmp <- subset(out, pval>0); tmp <- tmp[rev(1:nrow(tmp)),]

Put it all together:

out <- rbind(subset(out, pval<0), tmp)
out
ADD COMMENTlink written 2.9 years ago by fanli.gcb650
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 828 users visited in the last hour