Question: Count, LT, PH, PT Column in Functional Annotation Chart of DAVID
0
gravatar for nickhir
4 days ago by
nickhir20
german research cancer center (DKFZ)
nickhir20 wrote:

Hello everybody,

I am new to GSEA analysis and i use DAVID for the analysis, so please excuse if these are basic questions. I am currently analyzing WES data, and for that I compare my gene lists to different custom backgrounds. When I looked at the results of the Functional annotation chart I noticed something that I cannot explain in the Count, LT, PH, PT Columns.

It seems like the a gene in my gene list is only counted in the "Count" column, if it also appears in the background. Thus the more background genes I include, the bigger the "Count" gets and the more significant the enrichment is. Is this intentional? And if yes, why? I thought that the background shouldn't play a role in determining what genes of my gene list are in a particular Pathway or GO term.

Furthermore, I noticed that the PT (population total) and LT (List total) are different for each Term. I also have no idea why this is the case, since the number of genes in my gene list and my background shouldn't change.

If somebody could help my understand these observations, I'd be very grateful!

Cheers!

gsea david • 56 views
ADD COMMENTlink modified 4 days ago by Asaf8.5k • written 4 days ago by nickhir20
2
gravatar for Asaf
4 days ago by
Asaf8.5k
Israel
Asaf8.5k wrote:
  1. The background is the group of all the genes in the genome. If you entered genes which are not in the genome DAVID will ignore them and they will be dropped from the list.
  2. For each analysis the background might be different. If the mapping of gene <-> GO term was done on a set of genes different than the background then the background collection of genes will be the intersect of the "background" list and the list of genes relevant for the analysis , this is why the PT can change and also LT, if not all the genes in the input list are mapped.
ADD COMMENTlink written 4 days ago by Asaf8.5k

Thank you for your answer!

The background is the group of all the genes in the genome. If you entered genes which are not in the genome DAVID will ignore them and they will be dropped from the list.

This makes sense. But in this case I have a follow up question: What would be the best way to compare two gene lists? E.g. gene list 1 is treated, gene list 2 is untreated and i want to see if there are different enrichments (both gene lists are unranked). Right now, I was using gene list 1 as an input and gene list 2 as a background, but based on your answer this would be the wrong approach if I am not mistaken.

For each analysis the background might be different. If the mapping of gene <-> GO term was done on a set of genes different than the background then the background collection of genes will be the intersect of the "background" list and the list of genes relevant for the analysis , this is why the PT can change and also LT, if not all the genes in the input list are mapped.

Im afraid I cant follow you here. Why would the background be different for the same analysis? Only difference is the GO Category.

ADD REPLYlink written 4 days ago by nickhir20

The easy way to compare two gene lists is to compare list1 vs union(list1, list2) so the background should be both lists so the Fisher's exact test is ((list1 & GO), (list1 - GO), (all-list1& GO, all-list1 - GO)). Just watch out for overlap of the lists and how to interpret the results in this case.

As for the second point, GO mapping was done on all the genes in the genome but there might be a case that another mapping was done only for a subset of the genes so the genes in the background that weren't mapped will be removed from the background for this specific test.

ADD REPLYlink written 4 days ago by Asaf8.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1813 users visited in the last hour
_