For GSEA, check the example file formats to get an idea of the formatting. I recently used the JAVA implementation of GSEA for the first time and got it working.
cls file
Contains information on factors in our data. 35 7
means, in this case, 35 samples and 7 unique levels for the listed factor. On the third line of the file, we list the actual levels as they relate to our samples - these should line up to the columns in the gct file.
NB - these are space-delimited.
35 7 1
# d0 d1 d2 d4 d6 d8 d10
d0 d1 d2 d4 d6 d8 d10 d0 d1 d2 d4 d6 d8 d10 d0 d1 d2 d4 d6 d8 d10 d0 d1 d2 d4 d6 d8 d10 d0 d1 d2 d4 d6 d8 d10
gct file
This contains the expression values. You need a NAME
and DESCRIPTION
column before the counts values actually start. Description can be just na
. Again, note the header information, here, 18062 genes X 35 samples.
NB - these are tab-delimited.
#1.0
18062 35
NAME DESCRIPTION Day 0, rep 1 Day 1, rep 1 Day 2, rep 1 Day 4, rep 1 Day 6, rep 1 Day 8, rep 1 Day 10, rep 1 Day 0, rep 2 Day 1, rep 2 Day 2, rep 2 Day 4, rep 2 Day 6, rep 2 Day 8, rep 2 Day 10, rep 2 Day 0, rep 3 Day 1, rep 3 Day 2, rep 3 Day 4, rep 3 Day 6, rep 3 Day 8, rep 3 Day 10, rep 3 Day 0, rep 4 Day 1, rep 4 Day 2, rep 4 Day 4, rep 4 Day 6, rep 4 Day 8, rep 4 Day 10, rep 4 Day 0, rep 5 Day 1, rep 5 Day 2, rep 5 Day 4, rep 5 Day 6, rep 5 Day 8, rep 5 Day 10, rep 5
A1BG na -1.78750107249577 -1.78731965121805 -1.78739011815182 -1.78648292007421 -1.78825323052185 -1.75670265819045 -1.7856669206048 -1.78652518885366 -1.78682730267777 -1.78980334199807 -1.78644486265833 -1.7868860041479 -1.78844156465141 -1.78740712853483 -1.75644423399062 -1.78612773069836 -1.78929036918159 -1.78723396224438 -1.76697481762272 -1.78693195908128 -1.78629510548009 -1.78470994669637 -1.78615883408804 -1.75804087324122 -1.78652254894815 -1.78711039289089 -1.76833202023458 -1.78672978697874 -1.7850823437463 -1.78625577998891 -1.78670342516185 -1.78584154361388 -1.78728728194433 -1.78497558588491 -1.78644925915904
A1CF na 1.68492754186313 1.54066315490874 1.54006231864025 1.51816007039476 1.60513517299563 1.5837019048566 1.61600434016912 1.51769932951262 1.60421752506403 1.56906960878706 1.65730147755638 1.57148034912919 1.64703379520972 1.54022084471361 1.61967950619213 1.51949572547524 1.52562157884476 1.540660774612 1.54957287190596 1.48702357593441 1.54796402052754 1.59524718481615 1.48932230313822 1.60079524224128 1.75736087058801 1.51447655944983 1.61715833564219 1.60452069557156 1.52619397748714 1.48902853362178 1.57432099780454 1.64145506694909 1.56773033915297 1.52760402017735 1.65905159731629
gmt file
Contains the signatures:
GO_CELL_REDOX_HOMEOSTASIS http://software.broadinstitute.org/gsea/msigdb/cards/GO_CELL_REDOX_HOMEOSTASIS.html PDIA6 TXNDC9 GLRX3 PRDX4 TXNRD2 PDIA5 EGLN2 TXNRD3 AIFM3 CYBA CYBB DDIT3 QSOX2 DLD PDILT ERP44 DNAJC16 NNT TXNDC8 TXN2 GCLC GLRX GPX1 PDIA3 GSR ERO1L APEX1 NME9 IL6 GRXCR1 LTF NCF2 NCF4 NFE2L2 NOS1 NOS2 NOS3 P4HB GLRX2 TXNDC12 TXNDC11 TMX2 GLRX5 TXNDC3 DNAJC10 TMX3 SELS TMX4 ERO1LB TXNDC16 QSOX1 PDIA2 NCF1 SLC11A1 TXN TXNRD1 TXNDC15 PTGES2 TMX1 TXNDC5 CAMP SH3BGRL3 TXNDC2 KRIT1 AIFM1 TXNL1 PDIA4
GO_INTRINSIC_APOPTOTIC_SIGNALING_PATHWAY_IN_RESPONSE_TO_ENDOPLASMIC_RETICULUM_STRESS http://software.broadinstitute.org/gsea/msigdb/cards/GO_INTRINSIC_APOPTOTIC_SIGNALING_PATHWAY_IN_RESPONSE_TO_ENDOPLASMIC_RETICULUM_STRESS.html CASP12 CEBPB DAB2IP DDIT3 ERN1 PPP1R15A BBC3 GSK3B ERO1L UBE2K APAF1 ITPR1 MAP3K5 ATF4 ATP2A1 PMAIP1 PML DNAJC10 TRIB3 BAK1 BAX SELK BCL2 TMBIM6 TRAF2 XBP1 CHAC1 BAG6 CASP4 TNFRSF10B BRSK2 AIFM1
NB - these are tab-delimited.
----------------------------------------
Kevin