I had thought I understood the meaning of the positive and negative normalized enrichment scores (NES) produced by GSEA, but it just occurred to me I don't know what to expect from a gene set that is enriched both at high and low ranks. For example, some gene sets (e.g., GO biological processes) contain genes that are co-regulated under different conditions, but actually move in opposite directions (producing a mix of positive and negative fold-changes), such that these genes fall at the very top and bottom of the rank-ordered list that GSEA is using to compute NES. So, do these high- and low-ranking genes just cancel each other out leading to an NES close to zero or does GSEA just report the larger absolute enrichment between the positive and the negative NES?
The short answer to your question is that if you have an equal distribution of up- and down-regulated genes within a gene set your NES (Normalized Enrichment Score) should be close to zero.
The medium answer is that the NES is calculated from the ES (Enrichment Score), which is a rolling sum of the score for each gene. Effectively, you have a ranked list of all genes, and as you move from the top to the bottom of all genes you're changing your ES: for each gene in your list you're adding to the ES and for each gene not in your list you're subtracting from the ES. The ES is also weighted based on how close the genes are to one another in the list. So if you have a list of 100 genes, and 50 are right at the top of your list (the 50 most up-regulated genes) and 50 are spread throughout the bottom of your list (all downregulated but with a wide range of log2FC values) then you may still end up with a strongly positive ES.