GSEA Preranked Query
I have RNA seq Data (n=15) paired with antibody response data(n=15) at day 10. The question that I want to ask is that which genes sets/pathways get most pertuebed at day 10 in repose to vaccination. I get my spearman correlation values between each genes and antibody response at day 10. I wanted to do preranked GSEA with correlation values but the thing is that correlation values for a lot go genes come same which I could expect because there are many genes that should be coexpressed and does not change their expression generally and in response to antibody we expect say couple o 100 genes changes at day 10. so what can be my option because preranked Gsea do not resolve ties. Additionall I have both responders and not responders in my data.

Why do you think it makes sense to do GSEA with correlation values?

Two reasons: 1: I want to identify the genes that could predict the antibody response later in the analysis.

2: I saw people approaching this problem in this way in cell and nature papers

It would be a previlelige to get an answer from you on this if am not thinking correctly through it as I see you are a computational immunologist as well? I also have cases where I have just baseline rna seq data and antibody data at At baseline and 28 days and I want to see different gene sets enriched in it if they correlate well with antibody data. Thanks

I am surprised you have that many values that are the same. Are you removing genes with zero counts and zero for whatever your antibody response metric is?

So here's the thing-- 1- I did not prefiltered my data with low expressing genes- 2- I have two types of vaccines response data --one for pneumococcal vaccines where I have 11 serotypes and influenza vaccines where I have 03 serotypes at min. For pneumococcal vaccines data I have mostly good response data ---have good numbers For influenza vaccines one serotype would show good and rest would show 0s

We usually use sum of log2 expression or Sum FC Log 2 expressions for antibody data-- Do you think both of these are a problem? Not filtering low expressing genes and using sum values of antibody data having 0s?

Yes, I imagine high frequencies of zeros in either dataset will confound the ranking.

