Homer software -size parameter
1
2
Entering edit mode
4.2 years ago
yanweng ▴ 80

I want to use findMotifsGenome.pl program in Homer to identify enriched motif in the aggregated regions from single-cell ATAC-seq data.

I wonder what the -size parameter means? In their document, I found two explanations: 1. length of sequences used 2. -size <#> (fragment size to use for motif finding, default=200); -size <#,#> (i.e. -size -100,50 will get sequences from -100 to +50 relative from center); -size given (uses the exact regions you give it)

Does it mean to 1)specify the sub-size within the given peaks, or 2)the DNA size in the prepared library fragment? I am more lean to (1). Then if I give (-100, 50), does it will only search for motif within this range in each of the peak given in the bed file? Note that the bed file I input has average peak size of ~1500. I am not sure whether 150 window is too small.

Also, does anyone knows the publication link to Homer? Could you post a link here if there is any paper I can read?

ChIP-Seq sequencing gene ATAC-seq • 2.7k views
ADD COMMENT
3
Entering edit mode
4.2 years ago
yanweng ▴ 80

I just found this more detailed explanation:

Region Size ("-size <#>", "-size <#>,<#>", "-size given", default: 200) The size of the region used for motif finding is important. If analyzing ChIP-Seq peaks from a transcription factor, Chuck would recommend 50 bp for establishing the primary motif bound by a given transcription factor and 200 bp for finding both primary and "co-enriched" motifs for a transcription factor. When looking at histone marked regions, 500-1000 bp is probably a good idea (i.e. H3K4me or H3/H4 acetylated regions). In theory, HOMER can work with very large regions (i.e. 10kb), but with the larger the regions comes more sequence and longer execution time. These regions will be based off the center of the peaks. If you prefer an offset, you can specify "-size -300,100" to search a region of size 400 that is centered 100 bp upstream of the peak center (useful if doing motif finding on putative TSS regions). If you have variable length regions, use the option "-size given" and HOMER will use the exact regions that were used as input.

This seems to indicate my understanding (1) is correct? If I want to identify enriched motif in sc-ATAC-seq data, should I just use default -size 200, or - given?

ADD COMMENT
0
Entering edit mode

Hello, I am trying to understand how Homer chooses the center of the peak when I don't specify size, do you know if '-size given' is the default? I want to get positions for the motifs found in the peaks but the output does not look like the center of the peak is litterally the middle point of the peak.

My command is something like this:

annotatePeaks.pl con_brown_diffbind_close.bed Aque1.31 -m *.motif  -gff3 Aqu2.1_Genes.gff3  -mbed con_brown_diffbind_close.bed_motifs.bed > con_brown_diffbind_close.tsv

the output bed file (con_brown_diffbind_close.bed_motifs.bed) has start and end positions but the file con_brown_diffbind_close.tsv has positions relative to the center of the peak (something like +477, -50...) but the positions of both files only match if the center of the peak is not the middle point.

ADD REPLY

Login before adding your answer.

Traffic: 1877 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6