Dear community, i have knockout mouse and performed total RNA seq analysis comparing knockout with wildtype mice. Now I have a list of about 13000 genes. There is an inhibitor of this protein and it was used in patients. Authors also performed RNA seq analysis and published a list of genes, which are differentially expressed under treatment of this inhibitor. To compare published Data with my transcriptomic data i decided to use GSEA. I have seen several publications perfoming such analysis. So i transform published human genes to mouse orthologs, make ranking of my 13000 genes and performed preranked GSEA analysis. GSEA analysis showed that both gene sets are enriched with NES 1.4 for upregulated genes and NES -1.37 for downregulated genes, both with FDR < 0.05, that i can say that transcriptomic data in humans under inhibitor treatment and in knockout mice are comparable. Now i got a lot of critic about the analysis, that i can not use it for this purpose. My question is it okay to perform such kind of analysis? As i understand the original GSEA Paper this analysis was developed exactly for such things. I also search this Forum and GSEA was also recommended for such analysis.
Sounds like a perfectly reasonable analysis to me. Who is objecting? Do they give a basis? Are you sure that they are objecting to the concept of the analysis and not some facet of how it was implemented?
For example, one could imagine that there might be bais introduced by doing the human-mouse ortholog conversion. Presumably when you do your ranking, you should only rank those genes that are the ortholog of a human gene.
i.sudbery thnx a lot for your reply. I just need some support from other bioinformaticians. For me also sounds like a perfect analysis. I ranked the whole gene list based on signed fold change * -log10pvalue
Our new bioinformatician just say it's wrong... But no basis nothing. I even tried to help her and ask to read the original GSEA Paper, but...
Do you mean "sign of fold change" or "signed fold change", because
signed fold change * -log10pvalue would be a strange metric to rank on, where as "sign of fold change" makes some sense.
That is if a gene had a log-fold-change of -2 and a -log10pvalue of 3, then using -3 as the score makes some sense, but I'm not sure that -6 (-2*3) does.
That said, in your situation, I might be tempted to rank straightforwardly on log2 foldchange - yes this ignores the variance, but over and entire experiment, you'd expect as many genes to be under-estimated as overestimated, and so it to even out in over the entire set.
Yes you are right! It was 'sign of FC'