Question: Determine Whether A Gene Product Is A Transcription Factor
17
gravatar for Mike Dewar
8.2 years ago by
Mike Dewar1.5k
Columbia University, NYC, USA
Mike Dewar1.5k wrote:

I fear this question may be terribly basic. I've been asked for a list of transcription factors that are differentially expressed in an experiment. Finding differentially expressed genes is fine, but deciding which are TFs is proving a tad elusive.

The list I produced I generated by looking for the GO biological process "regulation of transcription" for each gene using biomart. If this phrase appeared somewhere I returned it, and if it didn't I filtered it out.

I got back an email saying "no way are some of these things TFs", which was frustrating. So now I'm looking for the GO molecular function "transcription factor activity" but suddenly have two questions:

  1. Is this any more likely to correspond to what a biologist is looking for as a TF?
  2. If a gene has a term that is more specific than "transcription factor activity", is there any way to see if its parent term is "transcription factor activity"?

If these questions, which I know are really basic, can be answered by a handy function in R like `is.TF(genesymbol)', that would be awesome.

gene transcription • 8.5k views
ADD COMMENTlink modified 7.8 years ago by Obi Griffith17k • written 8.2 years ago by Mike Dewar1.5k
1

Mike: You may remove the beginner tag, IMHO it is not a beginner level question. It is a real use-case for integrated bioinformatics datamining approach.

ADD REPLYlink written 8.2 years ago by Khader Shameer17k

I would change the title of this question to "Determine Whether A Gene is A Transcription Factor". Do you agree?

ADD REPLYlink written 8.2 years ago by Giovanni M Dall'Olio26k

@giovanni: you are a moderator, so IMHO you can go ahead and make questions clearer. See the "Other people can edit my stuff?!" section of the SO FAQ: http://stackoverflow.com/faq

ADD REPLYlink written 8.2 years ago by Michael Kuhn4.9k

Hi Michael, thank you but I was just asking for a confirmation, since I don't know if I understood well the question.

ADD REPLYlink written 8.2 years ago by Giovanni M Dall'Olio26k

How did you produce your list? I've just looked up Stat1 in EnsEMBL, in biomart, at MGI and on the GO website. All of them have it annotated as a trsncription factor.

ADD REPLYlink written 8.2 years ago by iw9oel_ad6.0k

@giovanni - thanks for the edit! Always good to have the question cleaned up!

ADD REPLYlink written 8.2 years ago by Mike Dewar1.5k

@Keith - you're right. I think I'm going to remove the list as it's now distracting rather than clarifying

ADD REPLYlink written 8.2 years ago by Mike Dewar1.5k
4
gravatar for Giovanni M Dall'Olio
8.2 years ago by
London, UK
Giovanni M Dall'Olio26k wrote:

To improve your results, I would filter the genes that also are associated with the term 'DNA binding activity', because all the transcription factors binds the DNA by definition. In fact, a gene that is associated with "Regulation of transcription factor activity" is not necessarily a transcription factor itself, as a gene may interact with other TFs and regulate their activity without actually being a TF itself.

The problem that you have pointed with STAT1 is due to the annotation in GeneOntology and it is not possible for you to solve it. There is an old topic where we have discussed [how much one can trust the GeneOntology's annotations]. The annotations on GeneOntology are good but are not complete, and there are really a lot of false negatives, and some false positives. The only thing that you can do, when you find a gene that should be associated with a term but it is not, is to go to the GO's bug tracker on sourceforge and report the case to the maintainers. They answer very quickly, and in one or two days (maybe more, since we are in august) they will explain you why STAT1 is not associated with that term.

ADD COMMENTlink modified 5 weeks ago by RamRS18k • written 8.2 years ago by Giovanni M Dall'Olio26k

Mouse Stat1 is annotated directly to GO:0003700 (transcription factor activity) at both MGI and EnsEMBL (the source of the biomart an annotation). I don't think the problem is with the GO.

ADD REPLYlink written 8.2 years ago by iw9oel_ad6.0k

@Keith yep - my mistake!

ADD REPLYlink written 8.2 years ago by Mike Dewar1.5k

@giovanni - as Keith has pointed out the thing about STAT1 is a mistake on my part, not GO's. Though I think your point is still valid! So are you saying that I should look for "DNA binding activity" AND "Regulation of transcription factor activity"?

ADD REPLYlink written 8.2 years ago by Mike Dewar1.5k

It is always good to have a list of genes to be used as controls or test cases for your analysis. You should identify some genes that you are sure to be TF, and some genes that you are sure they are not, and use them to evaluate the effectiveness of your pipeline. In any case, yes, I think you could try "DNA binding activity" AND "Regulation of transcription factor activity", and maybe mark as 'possible TF' the genes that are associated with only one of these terms.

ADD REPLYlink written 8.2 years ago by Giovanni M Dall'Olio26k
3
gravatar for brentp
8.2 years ago by
brentp22k
Salt Lake City, UT
brentp22k wrote:

I would not rely on GO annotations. Take a look at (for example, from a quick google search) this paper in nature reviews genetics 2009 which says that:

Further analysis using the GO database (Fig. 1b) showed that most human TFs are unannotated, indicating that they remain uncharacterized

They also provide a list of semi-curated loci that encode TF's. I don't work with human, but there may be other sources worth looking into.

ADD COMMENTlink modified 5 weeks ago by RamRS18k • written 8.2 years ago by brentp22k

Blimey. So my immediate problem isn't that there are lots of TFs that I don't catch because they're unannotated, it's picking those that are annotated from all non-TF genes. This lack of annotation will undoubtedly become important though....

ADD REPLYlink written 8.2 years ago by Mike Dewar1.5k

Brent: Thanks a lot for sharing this paper. Interesting paper. Mike: You may noticed that, this article described the TFs based on InterPro domains. I strongly recommend to get a library of domains for mouse and do hmmpfam/interproscan search on your sequence and consider GO as additional level of annotation.

ADD REPLYlink written 8.2 years ago by Khader Shameer17k
3
gravatar for Khader Shameer
8.2 years ago by
Manhattan, NY
Khader Shameer17k wrote:

I am afraid, there is no single tool/method which can tell you whether a given gene is a TF or not. One option is to consult a database of transcription factors for example DBD (Mouse TFs are available here) or to use a learning algorithm which can predict it from sequence (not sure if such an algorithm exist).

If the genes are small in number I would recommend to use a two-step integrated search with Pfam domain architecture and GO rather than relying only on GO terms. First, I would look for the protein domain architecture of the hits. For example STAT1(Wikipedia, Uniprot), I can get the Pfam page here. This protein encodes distinct domains from the members of protein domain family of transcription factors (STAT_int, STAT_alpha and STAT_bind ). As Pfam domains are assigned based on the sequence properties, these predictions are reliable. You may also consult GeneRIF of your genes (GeneRIF for STAT1). They are automatically curated gene related information from literature.

Get the gene ID

Map to Uniprot

Search in Pfam

Get Pfam based protein domain architecture

Check if any of the Pfam-A domains assigned to sequence is part of family/families of transcription factors

As DBD is updated in 2008, I would recommend you to use this approach to get up-todate result. Using Perl(or any language of your choice) you can automate this as an entire workflow via ID mapping.

Giovanni provided a good overview of why a gene product with annotation "Regulation of transcription factor activity" is not a TF. Also he pointed some of the current issue with GO, but IMHO irrespective of all these odds GO is the best resource to understand function of a group of genes. The GO approach is much better than reading gene descriptions from the manuscripts and to decide about the potential function of the genes.

ADD COMMENTlink modified 5 weeks ago by RamRS18k • written 8.2 years ago by Khader Shameer17k
3
gravatar for Stefano Berri
8.2 years ago by
Stefano Berri4.0k
Cambridge, UK
Stefano Berri4.0k wrote:

Try with Biobase transfact

Their database is manually curated. Meaning that there are people that read papers and enter information related to transcription factors...

Biobase is a company, so you probably need to pay a subscription to actually access the information, but you should be able to understand if you can have such a list before paying. I guess if you write them, they will tell you how and what to do and costs. Of course it is not you who have to pay ;)

Otherwise ask the "biologist" how she/he would decide if a gene is a transcription factor. If she/he realize it is not possible (because the information is not out there) will be happy to find a second best approximation. Otherwise she/he will point you to the right solution (or what will judge as right)

ADD COMMENTlink modified 5 weeks ago by RamRS18k • written 8.2 years ago by Stefano Berri4.0k

I would have considered Biobase for such an analysis, if they have a free-full academic version. AFAIK, their academic version is pretty old and it will be a great limitation of these type of analysis. Function annotations in biology is highly temporal these days.

ADD REPLYlink written 8.2 years ago by Khader Shameer17k

I second Khader's opinion. The free academic version of Transfac is too old to be of much value these days.

ADD REPLYlink written 8.2 years ago by Lars Juhl Jensen11k
3
gravatar for Lars Juhl Jensen
8.2 years ago by
Copenhagen, Denmark
Lars Juhl Jensen11k wrote:

I would probably rely on a combination of two GO terms to extract a reasonably confident set of transcription factors, namely GO:0003677 "DNA binding" and GO:0006355 "regulation of transcription, DNA-dependent".

After doing back-tracking of the explicitly annotated terms in the GO directed acyclic graph (DAG), I find about 2500 genes annotated with each of the two terms and about 1700 annotated with both terms. That should be a pretty good starting point.

If I should improve it further, I would follow it up with a domain analysis using SMART or PFam. I would then compile a list of domains typically found in TFs, and use that to identify likely false positives on the list as well as possible false negatives in the rest of the genome. I am not sure it would be worth the extra effort, though.

ADD COMMENTlink written 8.2 years ago by Lars Juhl Jensen11k
2

That is precisely why I combine the two terms - what I'm saying is that something is a TF if it is both DNA binding AND involved in regulation of transcription. Neither term alone is sufficient.

ADD REPLYlink written 8.2 years ago by Lars Juhl Jensen11k

IMHO, using GO term "DNA binding" may not fetch exact results. DNA binding is not exclusive to TFs, this can include histones, several enzymes which may not perform a role in transcription as such. etc (Ref. http://en.wikipedia.org/wiki/DNA-binding_protein). Please let me know if am missing something.

ADD REPLYlink modified 5 weeks ago by RamRS18k • written 8.2 years ago by Khader Shameer17k

Lars, Thanks for sharing your thoughts.

ADD REPLYlink written 8.2 years ago by Khader Shameer17k
3
gravatar for hurfdurf
8.2 years ago by
hurfdurf460
United States
hurfdurf460 wrote:

Another reference that might be of use is [?]ORegANNO[?]. Databases are free and downloadable with lots of cross references. Last updates were as of late 2009 according to their front page though. I haven't checked the mailing list to see how active it is, but I believe their goal was to be a LGPL open version of TRANSFAC.

ADD COMMENTlink written 8.2 years ago by hurfdurf460
1

PAZAR integrates data from ORegANNO.

ADD REPLYlink written 8.2 years ago by Khader Shameer17k
3
gravatar for Obi Griffith
6.6 years ago by
Obi Griffith17k
Washington University, St Louis, USA
Obi Griffith17k wrote:

An important resource that seems to be missing from the answers so far is TFCat. It comes from the same group responsible for PAZAR. From their front page "TFCat is a curated catalog of mouse and human transcription factors (TF) based on a reliable core collection of annotations obtained by expert review of the scientific literature." Using their Data Download tool you can quickly get to a list of high quality annotated TFs with corresponding pubmed ids.

ADD COMMENTlink modified 5 weeks ago by RamRS18k • written 6.6 years ago by Obi Griffith17k
2
gravatar for Gareth Palidwor
8.2 years ago by
Gareth Palidwor1.6k
Ottawa
Gareth Palidwor1.6k wrote:

I've had good results using proteins annotated with GO:0003677 (DNA binding) and GO:0003700 (Transcription Factor activity). Looking at domains can be helpful but may not provide any additional information as GO annotations may be derived from the domains.

ADD COMMENTlink written 8.2 years ago by Gareth Palidwor1.6k
1

I would kindly disagree with the usage of GO term "DNA binding" : if Mike is only interested in TFs. DNA binding is not exclusive to TFs, this can include histones, several enzymes etc.

ADD REPLYlink modified 5 weeks ago by RamRS18k • written 8.2 years ago by Khader Shameer17k

That's why I used DNA binding and TF activity together.

ADD REPLYlink written 8.2 years ago by Gareth Palidwor1.6k
2
gravatar for Fred Fleche
8.2 years ago by
Fred Fleche4.2k
Paris, France
Fred Fleche4.2k wrote:

[?]

link to PAZAR

ADD COMMENTlink modified 5 weeks ago by RamRS18k • written 8.2 years ago by Fred Fleche4.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 742 users visited in the last hour