Question: GSEA- can't create a gene set GRP file
0
gravatar for kdc15
15 months ago by
kdc1540
kdc1540 wrote:

I am trying to run GSEA on a list of differentially expressed genes (control vs. treated) however, I would like to perform this against a list of genes that I provide. I have converted my gene list to GRP file format, and I have also run the list through HUGO and filtered it to make sure that they are approved symbols. However, when I run the analysis, I immediately get the error: Could not find feat index for: -1 SELENOH

Everytime I run this, it just changes the gene name that it cannot find an index for.

gsea grp geneset • 451 views
ADD COMMENTlink written 15 months ago by kdc1540

Double check that you have the correct format for GRP by looking HERE.

If not GRP, you can also specify your gene sets with GMT format, as I show here: A: Running GSEA for DEGs

ADD REPLYlink written 15 months ago by Kevin Blighe66k

Thank you for your response. I have double checked the GRP format. It seems to be in order (one gene per line, HUGO approved etc.). Will the GMT format be applicable as I only have one gene set?

ADD REPLYlink written 15 months ago by kdc1540

Yes, it should work for just 1 gene set, too.

ADD REPLYlink written 15 months ago by Kevin Blighe66k

Even when I format to GMT, I still get the same error

ADD REPLYlink written 15 months ago by kdc1540

Please paste a sample of your data. Check for hidden encoding, i.e., if you have transferred your files between a Windows- and Linux-based OS.

ADD REPLYlink modified 15 months ago • written 15 months ago by Kevin Blighe66k

according to the GMT requirements, I organised my genes across like this

  genename  na  SEMA3C  DIO2    VCAN    CPA4    CXCL8   CHRNA6  CXCL6   PI16    CHPF    TSPAN8  CD69    HAS2    ANOS1   IGFBP5  TAGLN etc.

my dataset in the .gct format is organised as follows:

    #1.2            
27115   2       
Gene_Symbol Description control treated
A1BG    alpha-1-B glycoprotein  0.09272033  0.24820567
A2M alpha-2-macroglobulin   2.289191    2.28871867
A2MP1   alpha-2-macroglobulin pseudogene 1      
NAT1    N-acetyltransferase 1 (arylamine N-acetyltransferase)   4.81155967  3.232808
NAT2    N-acetyltransferase 2 (arylamine N-acetyltransferase)   0.03398567  0.017806
SERPINA3    serpin family A member 3    0.01228467  0.01017033
AADAC   arylacetamide deacetylase   0.34304667  0.285272
AAMP    angio associated migratory cell protein 62.0847053  46.040777
AANAT   aralkylamine N-acetyltransferase    0.01549633  0.03760633
AARS    alanyl-tRNA synthetase  68.329862   32.3569037
ABAT    4-aminobutyrate aminotransferase    0.12132433  1.19235033
ADD REPLYlink modified 15 months ago • written 15 months ago by kdc1540

I don't have my own original files with me. However, one important feature is that those should definitively be tabs between each column. If they are spaces, it won't work (I believe). Between the 2 numbers on your second line, that may have to be just a single space - not sure. The format spec is very specific, though.

Ping this thread again if that doesn't solve it, and I will pick it up later tonight when I am home.

ADD REPLYlink modified 15 months ago • written 15 months ago by Kevin Blighe66k

I had saved the file as tab-delimited and then added the .gct extension. Is it not sufficient to do it this way?

ADD REPLYlink written 15 months ago by kdc1540
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1523 users visited in the last hour