I have a list of gens set of interest (assume a list of 20 probes) then I have a matrix showing the rank of genes which are highly expressed to low expression (10 columns and 300 rows representing 10 drugs and 300 probes) This matrix is not values but only probes names.
I want to perform the Kolmogorov-smirnov on it. does anybody know whether it is possible or not? if yes, how can I do it in R?
Here is an example of my geneses data
"retrorsine.HL60.644.5500024024213121906559.H02.name" "seneciphylline.HL60.644.5500024024213121906559.F04.name" "cytisine.HL60.641.640641112706.E07.name" "pregnenolone.HL60.635.5500024024214122006604.B10.name" "salsolidin.HL60.635.5500024024214122006604.H09.name" "Prestwick.860.HL60.659.5500024030760072207028.E07.name" | |
"1" "206421_s_at" "202988_s_at" "204426_at" "204844_at" "210593_at" "216406_at" | |
"2" "220792_at" "216614_at" "215708_s_at" "208482_at" "215590_x_at" "215619_at" | |
"3" "207524_at" "217996_at" "218352_at" "221638_s_at" "211049_at" "204289_at" | |
"4" "207537_at" "220437_at" "202760_s_at" "221422_s_at" "215815_at" "216872_at" | |
"5" "208400_at" "219778_at" "218129_s_at" "206461_x_at" "219781_s_at" "210937_s_at" | |
"6" "214238_at" "209122_at" "216504_s_at" "212185_x_at" "204624_at" "220837_at" | |
"7" "219812_at" "202619_s_at" "209891_at" "205884_at" "214425_at" "206665_s_at" | |
"8" "208126_s_at" "214230_at" "220553_s_at" "210787_s_at" "203421_at" "215725_at" | |
"9" "211231_x_at" "220573_at" "201823_s_at" "203505_at" "201906_s_at" "216566_at" | |
"10" "204856_at" "211525_s_at" "209406_at" "216611_s_at" "205153_s_at" "214592_s_at" | |
"11" "220951_s_at" "218589_at" "218538_s_at" "205611_at" "206232_s_at" "216844_at" | |
"12" "205635_at" "205220_at" "212709_at" "211426_x_at" "216990_at" "210597_x_at" | |
"13" "214355_x_at" "207361_at" "210145_at" "204567_s_at" "208062_s_at" "220828_s_at" | |
"14" "217176_s_at" "205027_s_at" "217253_at" "215688_at" "220808_at" "214799_at" | |
"15" "207516_at" "208065_at" "205419_at" "221060_s_at" "217212_s_at" "206198_s_at" | |
"16" "217193_x_at" "216464_x_at" "215716_s_at" "207344_at" "207745_at" "202015_x_at" | |
"17" "213438_at" "209073_s_at" "214787_at" "211090_s_at" "202812_at" "210303_at" | |
"18" "207593_at" "201462_at" "219767_s_at" "216287_at" "208365_s_at" "207837_at" | |
"19" "206191_at" "220902_at" "211094_s_at" "221778_at" "215265_at" "210390_s_at" | |
"20" "212813_at" "222318_at" "211698_at" "220916_at" "206883_x_at" "217565_at" | |
"21" "211502_s_at" "206479_at" "205659_at" "211819_s_at" "218559_s_at" "215904_at" | |
"22" "205958_x_at" "204438_at" "218748_s_at" "206244_at" "206397_x_at" "216625_at" | |
"23" "221633_at" "202219_at" "201163_s_at" "211456_x_at" "205502_at" "216415_at" | |
"24" "221395_at" "216373_at" "203566_s_at" "210025_s_at" "215822_x_at" "211242_x_at" | |
"25" "211001_at" "208083_s_at" "216060_s_at" "210186_s_at" "214151_s_at" "210961_s_at" | |
"26" "202885_s_at" "202912_at" "218930_s_at" "220764_at" "221018_s_at" "210496_at" | |
"27" "203435_s_at" "208482_at" "218111_s_at" "206391_at" "217273_at" "202921_s_at" | |
"28" "219195_at" "202620_s_at" "202558_s_at" "213973_at" "215610_at" "217150_s_at" | |
"29" "217053_x_at" "206310_at" "209622_at" "215921_at" "216754_at" "206442_at" | |
"30" "215512_at" "209875_s_at" "209707_at" "221618_s_at" "220595_at" "217210_at" | |
"31" "204687_at" "215427_s_at" "221740_x_at" "216001_at" "221933_at" "214926_at" |
@mark.ziemann thank you so much Mark but it is more tricky! I tried to make the rnk and gmt. for the rnk I need to have two columns while I only have one which you can find an example here
You can find the geneset file in the main question
I appreciate any further explanation. thanks again
The classic mode of GSEA only takes into the rank of the genes and the direction, so if you have 16,000 genes and 8000 have +'ive foldchange and the rest -'ive fold change, you can give them a score from +8000 to -8000 and the GSEA should work if you abide by all formats for rnk/gmt/chip files.
@mark.ziemann is there any example to do so? I have been googling it for a week !!! still could not find any example or information
The rank file should have this format:
If you are having problems with formats, you could try the java GUI to troubleshoot them first.
@mark.ziemann how can I make those score for my probe set ? are they expression or fold change or something like that ? or should I calculate them based on a formula?
The signed log10 pvalue is a good choice reference.