Question: Transcription Factor Binding Site Prediction
8
gravatar for Panagiotis Alexiou
9.0 years ago by
Athens, Greece
Panagiotis Alexiou200 wrote:

Using TF matrices to predict TF binding sites (TFBS) in regions of interest.

This is my plan:

a. Download TF matrices

I have seen TRANSFAC and JASPAR mentioned in relation to TF matrices. I have found some text files in the JASPAR database that seem like what I need and I will probably use these. Would anybody know if these are any different from the TRANSFAC matrices? Any other resources for matrices?

b. Predict TFBS in sequence of interest

For each TF matrix, predict where TFBS could be found in the sequences of interest. I have looked at the TFBS module for perl and although I don't want to doubt that what it does is right, the way that it searches for TFBS is not clear to me and so I wouldn't want to use it in a serious analysis.


My questions:

  1. Are there any easy ways to bulk download TF matrices for all known TFs? (vertebrate, fly, nematode - separate for each species)

  2. Is there a fast and usable TFBS prediction program?

  • has to run from the command line
  • has to be fast (I have quite a few sequences)

Since I am completely at a loss and TF prediction is not exactly my area of expertise, I don't know if what I'm asking for is irrelevant, solved 100 times already etc. Feel free to just point me to some relevant reviews or such and/or your favourite programs. It seems that all resources I get are from the early 00s and many are not still functional.

ADD COMMENTlink modified 23 months ago by Biostar ♦♦ 20 • written 9.0 years ago by Panagiotis Alexiou200
10
gravatar for Will
9.0 years ago by
Will4.5k
United States
Will4.5k wrote:

I have a GIST for exactly this. You can clone/download it http://gist.github.com/764262

It uses the MOODS package (paper here: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2778336/) to process JASPAR style TFBS and any normal seq-interval format ... but with ~5 minutes of work you could switch it over to use fasta-files.

It runs blisteringly fast ... I can usually annotate all upstream-promoters of a genome within ~10 minutes.

Feel free to fork the repository and make any changes ... I always welcome pull-requests.

Hope that helps,

Will

ADD COMMENTlink written 9.0 years ago by Will4.5k
2

Great package! I had been looking for something like this for some time.

ADD REPLYlink written 9.0 years ago by Farhat2.9k
1

Nice to hear ... let me know if its useful.

ADD REPLYlink written 9.0 years ago by Will4.5k
1

This seems interesting, I'm not so good in python but maybe I could use it.

ADD REPLYlink written 9.0 years ago by Panagiotis Alexiou200
1

Hi i have a problem like you, i want to know if you could solved your problem with GIST. i don't know how can i run it. it hasnt any user guide. thanks a lot in advance

ADD REPLYlink written 8.7 years ago by Mohammad Reza Bakhtiarizadeh290
1

Is accurate enough to use TFBS matrices from humans to predict TFBS for other vertebrates ? Is there any relevant paper you can point me out? Is there also any up-to-date dabase with TFBS matrices? thanks a lot

ADD REPLYlink modified 6.0 years ago • written 6.0 years ago by bioLife50

Please check this paper and the related database CISBP: Weirauch, M. T., et al. (2014). "Determination and inference of eukaryotic transcription factor sequence specificity." Cell 158(6): 1431-1443.

ADD REPLYlink written 4.3 years ago by pengchy410
3
gravatar for Carl
8.9 years ago by
Carl80
DKFZ & Univ. Heidelberg, Heidelberg, Germany
Carl80 wrote:

Hi

a nice source pf PWMs is UniProbe: these are PWMs obtained using protein binding microarrays (check out Bulyks lab page here). You can download a large number of PWMs freely, for all sorts of organisms (mouse, yeast, nematode, etc...). The format is a little weird, but you can convert that to standard Transfac format (accepted by most tools) using RSA-tools convert-matrix. Select tab as the input format, and transfac as the output. I also suggest using RSA-tools matrix-scan.

ADD COMMENTlink written 8.9 years ago by Carl80
2
gravatar for parra.gonzalo
6.0 years ago by
parra.gonzalo40 wrote:

Try using INSECT's Server. It will help you with the TFBS search, you can add your own TFBS and perform the search either on FASTA files or in Genes from ENSEMBL, putting their IDs.

This is the publication http://www.ncbi.nlm.nih.gov/pubmed/24008418

ADD COMMENTlink written 6.0 years ago by parra.gonzalo40
1
gravatar for Darked89
9.0 years ago by
Darked894.2k
Barcelona, Spain
Darked894.2k wrote:

You may check this page

ADD COMMENTlink modified 5 months ago by RamRS25k • written 9.0 years ago by Darked894.2k
1
gravatar for Ian
9.0 years ago by
Ian5.6k
University of Manchester, UK
Ian5.6k wrote:

I am not a great fan of using matrices (I prefer using IUPAC patterns) for representing TFBS as it is difficult to know at what cut off a match is 'good' or not. When forced to do so I have used 'matrix-scan' at RSA Tools. It does at least allow the use of P-value thresholds.

Matrices can be directly downloaded in bulk from JASPAR; I downloaded the 'Archive.zip' and extracted the non-redundant matrices for vertebrates. I converted the JAPSAR format to TRANFAC format, as I know matrix-scan handles this well.

ADD COMMENTlink modified 5 months ago by RamRS25k • written 9.0 years ago by Ian5.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1517 users visited in the last hour