Question

Count, LT, PH, PT Column in Functional Annotation Chart of DAVID

0

Entering edit mode

3.3 years ago

nhaus ▴ 300

Hello everybody,

I am new to GSEA analysis and i use DAVID for the analysis, so please excuse if these are basic questions. I am currently analyzing WES data, and for that I compare my gene lists to different custom backgrounds. When I looked at the results of the Functional annotation chart I noticed something that I cannot explain in the Count, LT, PH, PT Columns.

It seems like the a gene in my gene list is only counted in the "Count" column, if it also appears in the background. Thus the more background genes I include, the bigger the "Count" gets and the more significant the enrichment is. Is this intentional? And if yes, why? I thought that the background shouldn't play a role in determining what genes of my gene list are in a particular Pathway or GO term.

Furthermore, I noticed that the PT (population total) and LT (List total) are different for each Term. I also have no idea why this is the case, since the number of genes in my gene list and my background shouldn't change.

If somebody could help my understand these observations, I'd be very grateful!

Cheers!

DAVID GSEA • 975 views

ADD COMMENT • link updated 3.3 years ago by Asaf 10k • written 3.3 years ago by nhaus ▴ 300

score 2 · Accepted Answer · 2021-01-14

2

Entering edit mode

3.3 years ago

Asaf 10k

The background is the group of all the genes in the genome. If you entered genes which are not in the genome DAVID will ignore them and they will be dropped from the list.
For each analysis the background might be different. If the mapping of gene <-> GO term was done on a set of genes different than the background then the background collection of genes will be the intersect of the "background" list and the list of genes relevant for the analysis , this is why the PT can change and also LT, if not all the genes in the input list are mapped.

ADD COMMENT • link 3.3 years ago by Asaf 10k

0

Entering edit mode

Thank you for your answer!

The background is the group of all the genes in the genome. If you entered genes which are not in the genome DAVID will ignore them and they will be dropped from the list.

This makes sense. But in this case I have a follow up question: What would be the best way to compare two gene lists? E.g. gene list 1 is treated, gene list 2 is untreated and i want to see if there are different enrichments (both gene lists are unranked). Right now, I was using gene list 1 as an input and gene list 2 as a background, but based on your answer this would be the wrong approach if I am not mistaken.

For each analysis the background might be different. If the mapping of gene <-> GO term was done on a set of genes different than the background then the background collection of genes will be the intersect of the "background" list and the list of genes relevant for the analysis , this is why the PT can change and also LT, if not all the genes in the input list are mapped.

Im afraid I cant follow you here. Why would the background be different for the same analysis? Only difference is the GO Category.

ADD REPLY • link 3.3 years ago by nhaus ▴ 300

1

Entering edit mode

The easy way to compare two gene lists is to compare list1 vs union(list1, list2) so the background should be both lists so the Fisher's exact test is ((list1 & GO), (list1 - GO), (all-list1& GO, all-list1 - GO)). Just watch out for overlap of the lists and how to interpret the results in this case.

As for the second point, GO mapping was done on all the genes in the genome but there might be a case that another mapping was done only for a subset of the genes so the genes in the background that weren't mapped will be removed from the background for this specific test.

ADD REPLY • link 3.3 years ago by Asaf 10k