Question: GSEA- can't create a gene set GRP file
0
gravatar for kdc15
4 months ago by
kdc1530
kdc1530 wrote:

I am trying to run GSEA on a list of differentially expressed genes (control vs. treated) however, I would like to perform this against a list of genes that I provide. I have converted my gene list to GRP file format, and I have also run the list through HUGO and filtered it to make sure that they are approved symbols. However, when I run the analysis, I immediately get the error: Could not find feat index for: -1 SELENOH

Everytime I run this, it just changes the gene name that it cannot find an index for.

gsea grp geneset • 181 views
ADD COMMENTlink written 4 months ago by kdc1530

Double check that you have the correct format for GRP by looking HERE.

If not GRP, you can also specify your gene sets with GMT format, as I show here: A: Running GSEA for DEGs

ADD REPLYlink written 4 months ago by Kevin Blighe51k

Thank you for your response. I have double checked the GRP format. It seems to be in order (one gene per line, HUGO approved etc.). Will the GMT format be applicable as I only have one gene set?

ADD REPLYlink written 4 months ago by kdc1530

Yes, it should work for just 1 gene set, too.

ADD REPLYlink written 4 months ago by Kevin Blighe51k

Even when I format to GMT, I still get the same error

ADD REPLYlink written 4 months ago by kdc1530

Please paste a sample of your data. Check for hidden encoding, i.e., if you have transferred your files between a Windows- and Linux-based OS.

ADD REPLYlink modified 4 months ago • written 4 months ago by Kevin Blighe51k

according to the GMT requirements, I organised my genes across like this

  genename  na  SEMA3C  DIO2    VCAN    CPA4    CXCL8   CHRNA6  CXCL6   PI16    CHPF    TSPAN8  CD69    HAS2    ANOS1   IGFBP5  TAGLN etc.

my dataset in the .gct format is organised as follows:

    #1.2            
27115   2       
Gene_Symbol Description control treated
A1BG    alpha-1-B glycoprotein  0.09272033  0.24820567
A2M alpha-2-macroglobulin   2.289191    2.28871867
A2MP1   alpha-2-macroglobulin pseudogene 1      
NAT1    N-acetyltransferase 1 (arylamine N-acetyltransferase)   4.81155967  3.232808
NAT2    N-acetyltransferase 2 (arylamine N-acetyltransferase)   0.03398567  0.017806
SERPINA3    serpin family A member 3    0.01228467  0.01017033
AADAC   arylacetamide deacetylase   0.34304667  0.285272
AAMP    angio associated migratory cell protein 62.0847053  46.040777
AANAT   aralkylamine N-acetyltransferase    0.01549633  0.03760633
AARS    alanyl-tRNA synthetase  68.329862   32.3569037
ABAT    4-aminobutyrate aminotransferase    0.12132433  1.19235033
ADD REPLYlink modified 4 months ago • written 4 months ago by kdc1530

I don't have my own original files with me. However, one important feature is that those should definitively be tabs between each column. If they are spaces, it won't work (I believe). Between the 2 numbers on your second line, that may have to be just a single space - not sure. The format spec is very specific, though.

Ping this thread again if that doesn't solve it, and I will pick it up later tonight when I am home.

ADD REPLYlink modified 4 months ago • written 4 months ago by Kevin Blighe51k

I had saved the file as tab-delimited and then added the .gct extension. Is it not sufficient to do it this way?

ADD REPLYlink written 4 months ago by kdc1530
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1642 users visited in the last hour