Question: TFBS enrichiment analysis
gravatar for mannoulag1
4 weeks ago by
mannoulag160 wrote:

Hi, I have a list of Arabidopsis gene symbols, how to do the enrichment analysis to identify their Transcription factors? It is possible to do this by TFBStools package? thanks

ADD COMMENTlink modified 4 weeks ago by Alex Reynolds27k • written 4 weeks ago by mannoulag160

What do you mean their transcription factors? TFs that bind to their promoters? Regardless, HOMER or AME from the MEME suite are probably your best bets. From a quick glance, it doesn't seem like the TFBStools package does motif enrichment analyses.

ADD REPLYlink written 4 weeks ago by jared.andrews071.8k

thank you very much, but I am looking for a R package.

ADD REPLYlink written 29 days ago by mannoulag160

PWMenrich may work for you then, though I've never used it and can't vouch for its results/ease of use.

ADD REPLYlink written 29 days ago by jared.andrews071.8k

thank you, I will try it.

ADD REPLYlink written 29 days ago by mannoulag160
gravatar for Alex Reynolds
4 weeks ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:

One possible approach is described below, which involves some manual work, but which facilitates a bit more control over inputs, outputs, and parameters:

  1. Get the annotations of your genes. The positions of these annotations should, ideally, match the assembly version you are using for FIMO calls, described below.

    These annotations should be formatted in a sorted BED6+ file, or converted to one via gtf2bed, gff2bed or other conversion tools that output sorted BED, with the ID in the fourth column and the strand information in the sixth column.

  2. Pad out -1000/+200 of the annotation TSSs, by strand, via, e.g. bedops:

    $ awk '($6 == "+"){ print $1, ($2-1), $2, $4 }' annotations.bed | bedops --range -1000:200 --everything - > tss.pad.for.bed
    $ awk '($6 == "-"){ print $1, $3, ($3+1), $4 }' annotations.bed | bedops --range -200:1000 --everything - > tss.pad.rev.bed
    $ bedops --everything tss.pad.for.bed tss.pad.rev.bed > promoters.bed

    You might change these bounds depending on what you define as a promoter, or other regulatory region where TFs would bind to and regulate gene activity.

  3. Do a FIMO scan at 1e-4 or other p-value threshold against your plant TF database(s) of choice (TRANSFAC, JASPAR, Athamap and CIS-BP are possibilities, for instance).

    I have an answer on the Bioinformatics SE site that explains how to do a FIMO scan for hg19 (human), against the non-redundant JASPAR vertebrate TF model database:

    If you use FIMO, you would repeat this or something similar for your assembly of Arabidopsis and for published TF model databases for Arabidopsis.

    The output of FIMO will be a collection of TF binding sites (TFBS) over your chosen assembly of Arabidopsis, in BED format. (Make sure that this result is sorted per sort-bed, as described in the SE answer.)

  4. Look for overlaps of, say, three or more bases between the file of padded TSSs (promoters.bed) and the TFBS that came out of running FIMO (fimo.bed):

    $ bedmap --echo --echo-map-id-uniq --delim '\t' --bp-ovr 3 promoters.bed fimo.bed > answer.bed

    Or if you want the full TFBS annotation, and not just the TF model names:

    $ bedmap --echo --echo-map --delim '\t' --bp-ovr 3 promoters.bed fimo.bed > answer.bed
  5. Repeat steps 1-4 of this analysis for background ("random") selections of genes over the whole genome. You could use shuf -n or sample or similar to get a random sample of genes from a text-formatted annotations file, then convert them to background promoters.

    Once you have a collection of TF model names for your genes-of-interest and for a random selection of genes-over-background, you could use a hypergeometric test to determine if any particular TFs are enriched, given the genes-of-interest.

    The following answer may help describe the use of this test in a more concrete way, for a similar scenario: A: Calculate if the co-occurring of two TFBSs is higher than one would expect by ch

ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by Alex Reynolds27k

Thank you Alex for your help, but I am using R3.5.1 for Windows. Can you help me please?

ADD REPLYlink written 29 days ago by mannoulag160

I apologize, but I don't have any R-based way to do this, except for the hypergeometric test portion of the answer.

ADD REPLYlink modified 29 days ago • written 29 days ago by Alex Reynolds27k

ok , thank you very much Alex.

ADD REPLYlink written 29 days ago by mannoulag160
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1904 users visited in the last hour