Question: How to find peaks which have specific TF motif sequence
gravatar for km1986
8 months ago by
km19860 wrote:

Now I perform motif analysis using HOMER software.

I want to extract the peaks which have a specific transcription factor motif from peak files, but I cannot figure out how to do it by reading tutorials.

For example, when I perform motif analysis on a peak file by HOMER using "" command and "PU.1 motifs" are enriched as a result, then I want to know which peaks contain PU.1 motif in the peak file I analyzed.

I would appreciate it if you give me some advice.

Thanks in advance.

chip-seq motif analysis homer • 613 views
ADD COMMENTlink modified 8 months ago by ATpoint31k • written 8 months ago by km19860

Hi , have you figured it out? I am considering to do the same thing using HOMER.

ADD REPLYlink written 5 months ago by JC20

Something wrong with the below answer?

ADD REPLYlink written 5 months ago by ATpoint31k

No, thanks for the suggestions about FIMO.

I am using HOMER for the motif analysis and I got good results so I want to get the locations of the enriched motifs.

The peak locations could be found using HOMER as they described in the guideline:

Finding Instance of Specific Motifs

By default, HOMER does not return the locations of each motif found in the motif discovery process. To recover the motif locations, you must first select the motifs you're interested in by getting the "motif file" output by HOMER. You can combine multiple motifs in single file if you like to form a "motif library". To identify motif locations, you have two options:

  1. Run with the "-find <motif file="">" option. This will output a tab-delimited text file with each line containing an instance of the motif in the target peaks. The output is sent to stdout.

For example: ERalpha.peaks hg18 MotifOutputDirectory/ -find motif1.motif > outputfile.txt

  1. Run with the "-m <motif file="">" option (see the annotation section for more info). Chuck prefers doing it this way. This will output a tab-delimited text file with each line containing a peak/region and a column containing instance of each motif separated by commas to stdout

For example: ERalpha.peaks hg18 -m motif1.motif > outputfile.txt

ADD REPLYlink modified 5 months ago • written 5 months ago by JC20

Cool, did not know Homer has an option to return specific motifs. Thanks, learned something new :)

ADD REPLYlink written 5 months ago by ATpoint31k
gravatar for ATpoint
8 months ago by
ATpoint31k wrote:

I use Find Individual Motif Occurrences (FIMO) from the MEME suite for this kind of analysis. It accepts a fasta file with sequences, e.g. use bedtools getfasta to convert your peaks to fasta format, and a position frequency matrix for the TF of interest, e.g. download from JASPAR or HOCOMOCO in MEME format. It then scans the sequences for significant similarity with the provided motif and returns the regions that match it:

In this example, lets check a stretch of DNA around the first exon of the human BCL6 gene for motif occurrences against all motifs listed in the JASPAR vertebrate core collection. In your case you should provide a fasta with all the sequences you are interested in.

Coordinates of the query sequence (hg38) chr3:187744307-187746589

## Get JASPAR motifs (vertebrate non-redundant core collection) in meme format:

## Unzip:

## Install fimo (part of MEME):
conda install -c bioconda meme

## if fimo complains about libiconv libraries, also install that manually:
conda install -c conda-forge libiconv 

## run fimo, providing the .meme file matching your TF:
fimo --parse-genomic-coord input.fa

The input.fa here looks like:


When specifying the genomic coordinates of the sequence in the fasta header in the form chr-start:end (1-based coordinates) and using the --parse-genomic-coord option of fimo, the resulting GFF file will show the exact coordinates of the motif in the genome.

Check output in gff format which contains significant matches:

head fimo_out/fimo.gff

##gff-version 3
chr3    fimo    nucleotide_motif    187745593   187745603   43.9    -   .   Name=MA0002.2_chr3-;Alias=RUNX1;ID=MA0002.2-RUNX1-1-chr3;pvalue=4.11e-05;qvalue= 0.177;sequence=TCTTGTGGCTT;
chr3    fimo    nucleotide_motif    187746233   187746243   40.4    +   .   Name=MA0002.2_chr3+;Alias=RUNX1;ID=MA0002.2-RUNX1-2-chr3;pvalue=9.11e-05;qvalue= 0.196;sequence=GTTTGTGGTGT;
chr3    fimo    nucleotide_motif    187744975   187744985   41.1    +   .   Name=MA0003.3_chr3+;Alias=TFAP2A;ID=MA0003.3-TFAP2A-1-chr3;pvalue=7.81e-05;qvalue= 0.323;sequence=CCCCCCAAGCA;
chr3    fimo    nucleotide_motif    187745763   187745774   41.9    +   .   Name=MA0018.3_chr3+;Alias=CREB1;ID=MA0018.3-CREB1-1-chr3;pvalue=6.41e-05;qvalue= 0.146;sequence=TGTGACGTCGGC;
chr3    fimo    nucleotide_motif    187745763   187745774   41.9    -   .   Name=MA0018.3_chr3-;Alias=CREB1;ID=MA0018.3-CREB1-2-chr3;pvalue=6.41e-05;qvalue= 0.146;sequence=GCCGACGTCACA;
chr3    fimo    nucleotide_motif    187746240   187746250   50.7    -   .   Name=MA0025.1_chr3-;Alias=NFIL3;ID=MA0025.1-NFIL3-1-chr3;pvalue=8.51e-06;qvalue= 0.0387;sequence=TTACGTAACAC;
chr3    fimo    nucleotide_motif    187746378   187746388   40.5    +   .   Name=MA0025.1_chr3+;Alias=NFIL3;ID=MA0025.1-NFIL3-2-chr3;pvalue=8.97e-05;qvalue= 0.204;sequence=ATATGTAACAA;
chr3    fimo    nucleotide_motif    187745661   187745670   40.4    -   .   Name=MA0028.2_chr3-;Alias=ELK1;ID=MA0028.2-ELK1-1-chr3;pvalue=9.09e-05;qvalue= 0.412;sequence=ACCGGAACCT;
chr3    fimo    nucleotide_motif    187745215   187745225   47.4    +   .   Name=MA0032.2_chr3+;Alias=FOXC1;ID=MA0032.2-FOXC1-1-chr3;pvalue=1.81e-05;qvalue= 0.0779;sequence=TAAATAAATAT;
ADD COMMENTlink written 8 months ago by ATpoint31k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1902 users visited in the last hour