Hello everybody,
I am new to GSEA analysis and i use DAVID for the analysis, so please excuse if these are basic questions. I am currently analyzing WES data, and for that I compare my gene lists to different custom backgrounds. When I looked at the results of the Functional annotation chart I noticed something that I cannot explain in the Count, LT, PH, PT Columns.
It seems like the a gene in my gene list is only counted in the "Count" column, if it also appears in the background. Thus the more background genes I include, the bigger the "Count" gets and the more significant the enrichment is. Is this intentional? And if yes, why? I thought that the background shouldn't play a role in determining what genes of my gene list are in a particular Pathway or GO term.
Furthermore, I noticed that the PT (population total) and LT (List total) are different for each Term. I also have no idea why this is the case, since the number of genes in my gene list and my background shouldn't change.
If somebody could help my understand these observations, I'd be very grateful!
Cheers!
Thank you for your answer!
This makes sense. But in this case I have a follow up question: What would be the best way to compare two gene lists? E.g. gene list 1 is treated, gene list 2 is untreated and i want to see if there are different enrichments (both gene lists are unranked). Right now, I was using gene list 1 as an input and gene list 2 as a background, but based on your answer this would be the wrong approach if I am not mistaken.
Im afraid I cant follow you here. Why would the background be different for the same analysis? Only difference is the GO Category.
The easy way to compare two gene lists is to compare list1 vs union(list1, list2) so the background should be both lists so the Fisher's exact test is ((list1 & GO), (list1 - GO), (all-list1& GO, all-list1 - GO)). Just watch out for overlap of the lists and how to interpret the results in this case.
As for the second point, GO mapping was done on all the genes in the genome but there might be a case that another mapping was done only for a subset of the genes so the genes in the background that weren't mapped will be removed from the background for this specific test.