Scanning a genomic region for TFBS
2
0
Entering edit mode
8 months ago
Mohammed ▴ 10

Hello,

I have a genomic region of interest let us say: 100 kb upstream of a certain gene that I want to scan for potential TF binding sites. What tools can do that? Most of the tools require multiple genomic regions rather than a single coordinate.

Thanks.

Transcription Factor TF TFBS • 863 views
ADD COMMENT
0
Entering edit mode

There is a software tool called ROSE to find enhancers.

ADD REPLY
0
Entering edit mode

If you use R then you could probably use TFBStools. If I recall correctly from the time when I used it - you need a PWM file of the TF and the sequence of interest for you. It will scan the region and identify regions and assign a p-value per "hit"

ADD REPLY
0
Entering edit mode

Alternatively, https://bioconductor.org/packages/release/bioc/html/motifmatchr.html, which I found less tedious than TFBStools.

ADD REPLY
0
Entering edit mode
8 months ago
ATpoint 82k

I would run fimo https://meme-suite.org/meme/doc/fimo.html

You need the region as a fasta file (bedtools getfasta) and motifs for example from HOCOMOCO or Jaspar. Fimo checks which motifs can significantly be matched to that region.

ADD COMMENT
0
Entering edit mode
8 months ago

I have an answer posted on the Bioinformatics SE that suggests how to use FIMO to scan for TFBS over a genome:

https://bioinformatics.stackexchange.com/a/2491/776

This example uses UCSC to retrieve sequence information, JASPAR for MEME-formatted motif models, and BEDOPS starch to compress the results (which will be sizeable).

Once you have your whole-genome set of TFBSs, you can use set operations to look for binding sites within regions of interest, e.g. proximal promoters that could be defined as a given window upstream of each stranded gene's TSS.

Say you have generated your TFBS in a sorted BED5 or BED5+ file called TFBSs.bed, and your gene TSSs are in TSSs.for.bed and TSSs.rev.bed, separated by strand.

You can then use BEDOPS bedops with bedmap to find TFBS in proximal promoters — e.g. for forward-stranded TSSs:

bedops --everything --range -100000:0 TSSs.for.bed \
    | bedmap --echo --echo-map-id-uniq --delim '\t' - TFBSs.bed \
    > answer.for.bed

The file answer.for.bed will have your forward-stranded TSS windows (proximal promoters) and a listing of unique names of motif model associated with TFBSs that overlap the promoter.

For reverse-stranded gene TSSs, you just change the --range argument:

bedops --everything --range 0:100000 TSSs.rev.bed \
    | bedmap --echo --echo-map-id-uniq --delim '\t' - TFBSs.bed \
    > answer.rev.bed

If you want everything in one file at the end, in sorted order:

bedops --everything answer.for.bed answer.rev.bed > answer.bed

Note that the above example uses premade motif models from JASPAR. This is different from scanning your promoters for putative or predicted binding motifs. For that, you could use MEME, instead of FIMO. Some discussion of the difference here with comments: https://bioinformatics.stackexchange.com/a/8692/776

ADD COMMENT

Login before adding your answer.

Traffic: 1700 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6