Question

GSEA- can't create a gene set GRP file

0

Entering edit mode

4.8 years ago

kdc15 ▴ 40

I am trying to run GSEA on a list of differentially expressed genes (control vs. treated) however, I would like to perform this against a list of genes that I provide. I have converted my gene list to GRP file format, and I have also run the list through HUGO and filtered it to make sure that they are approved symbols. However, when I run the analysis, I immediately get the error: Could not find feat index for: -1 SELENOH

Everytime I run this, it just changes the gene name that it cannot find an index for.

gsea geneset grp • 2.1k views

ADD COMMENT • link 4.8 years ago by kdc15 ▴ 40

0

Entering edit mode

Double check that you have the correct format for GRP by looking HERE.

If not GRP, you can also specify your gene sets with GMT format, as I show here: A: Running GSEA for DEGs

ADD REPLY • link 4.8 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you for your response. I have double checked the GRP format. It seems to be in order (one gene per line, HUGO approved etc.). Will the GMT format be applicable as I only have one gene set?

ADD REPLY • link 4.8 years ago by kdc15 ▴ 40

0

Entering edit mode

Yes, it should work for just 1 gene set, too.

ADD REPLY • link 4.8 years ago by Kevin Blighe 87k

0

Entering edit mode

Even when I format to GMT, I still get the same error

ADD REPLY • link 4.8 years ago by kdc15 ▴ 40

0

Entering edit mode

Please paste a sample of your data. Check for hidden encoding, i.e., if you have transferred your files between a Windows- and Linux-based OS.

ADD REPLY • link 4.8 years ago by Kevin Blighe 87k

0

Entering edit mode

according to the GMT requirements, I organised my genes across like this

  genename  na  SEMA3C  DIO2    VCAN    CPA4    CXCL8   CHRNA6  CXCL6   PI16    CHPF    TSPAN8  CD69    HAS2    ANOS1   IGFBP5  TAGLN etc.

my dataset in the .gct format is organised as follows:

    #1.2            
27115   2       
Gene_Symbol Description control treated
A1BG    alpha-1-B glycoprotein  0.09272033  0.24820567
A2M alpha-2-macroglobulin   2.289191    2.28871867
A2MP1   alpha-2-macroglobulin pseudogene 1      
NAT1    N-acetyltransferase 1 (arylamine N-acetyltransferase)   4.81155967  3.232808
NAT2    N-acetyltransferase 2 (arylamine N-acetyltransferase)   0.03398567  0.017806
SERPINA3    serpin family A member 3    0.01228467  0.01017033
AADAC   arylacetamide deacetylase   0.34304667  0.285272
AAMP    angio associated migratory cell protein 62.0847053  46.040777
AANAT   aralkylamine N-acetyltransferase    0.01549633  0.03760633
AARS    alanyl-tRNA synthetase  68.329862   32.3569037
ABAT    4-aminobutyrate aminotransferase    0.12132433  1.19235033

ADD REPLY • link 4.8 years ago by kdc15 ▴ 40

0

Entering edit mode

I don't have my own original files with me. However, one important feature is that those should definitively be tabs between each column. If they are spaces, it won't work (I believe). Between the 2 numbers on your second line, that may have to be just a single space - not sure. The format spec is very specific, though.

Ping this thread again if that doesn't solve it, and I will pick it up later tonight when I am home.

ADD REPLY • link 4.8 years ago by Kevin Blighe 87k

0

Entering edit mode

I had saved the file as tab-delimited and then added the .gct extension. Is it not sufficient to do it this way?

ADD REPLY • link 4.8 years ago by kdc15 ▴ 40