Question: Identifying the orientation of a CTCF motif?
0
gravatar for Sinji
24 months ago by
Sinji2.9k
UT Southwestern Medical Center
Sinji2.9k wrote:

I'm analyzing some CTCF ChIP-seq data, i'm interested in recording the orientation of CTCF sites as they have been shown to have important roles in the underlying biology. I can't seem to find any information on how to do this, despite it being fairly popular. Perhaps just not using the right search terms. Any ideas?

ctcf chip-seq • 1.8k views
ADD COMMENTlink modified 24 months ago by simon.vanheeringen170 • written 24 months ago by Sinji2.9k
4
gravatar for simon.vanheeringen
24 months ago by
simon.vanheeringen170 wrote:

Use the CTCF motif to scan the peaks. The directionality of the motif match presumably tells you the CTCF orientation.

For instance, you can use this CTCF motif (save in a tab-separated text file):

>C2H2_ZF_Average_200
0.081779124449  0.816566257007  0.0503624700168 0.0512921485275
0.00454091560919        0.992667683465  0.000844143310774       0.00194725761473
0.729859139975  0.0190570790231 0.169685871266  0.0813979097358
0.03204130191   0.630845335167  0.323279272143  0.0138340907799
0.123093260475  0.494372096106  0.0637025611488 0.31883208227
0.901255804964  0.0197757515554 0.0514199924903 0.02754845099
0.0032323521056 0.00108383141845        0.992842879801  0.00284093667487
0.416975026239  0.006048776898  0.5707589234    0.00621727346346
0.0353963142494 0.0316137184991 0.579714716892  0.353275250359
0.00986220321585        0.00125463577494        0.985814739139  0.00306842187041
0.0950088577041 0.0355503655191 0.815560120364  0.053880656413
0.0980555406351 0.793094920278  0.0235874699945 0.0852620690928
0.362317695845  0.0268864366115 0.577387352297  0.0334085152465

For scanning, you can try GimmeMotifs. Using gimme scan you can use this motif to scan your peaks. Replace hg38 with your genome of interest.

$ gimme scan CTCF_peaks.bed -p CTCF.pwm -g hg38 -b > CTCF_motifs.bed

This will report at most one match per peak, with an estimated FPR of 1% based on random genomic sequences. The strand column in the BED output will tell you the direction of the motif.

ADD COMMENTlink written 24 months ago by simon.vanheeringen170

This is excellent, thank you very much!

ADD REPLYlink written 24 months ago by Sinji2.9k

Even if an old answer, I am using it for my purposes. I want to have a final bed file with CTCF colour coded annotation according to the motif orientation on the genome. But I keep having problems with the code.

I am using this command: gimme scan MK_CTCF_From_Romina_hg38_c10.0_l245_g100_peaks -p CTCF.pwm -g hg38 -b > CTCF_motifs.bed

this is the structure of my bed file:

chr1:16100-16375
chr1:103922-104996
chr1:138811-139325
chr1:267382-268156
chr1:609167-610766
chr1:665825-666188
chr1:778686-779058
chr1:857890-858227
chr1:869737-870095
chr1:904592-904947

and this is the CTCF.pwm

>CTCF_known1 CTCF_1 CTCF_jaspar_MA0139.1
Y 0.095290 0.318729 0.083242 0.502738
D 0.182913 0.158817 0.453450 0.204819
R 0.307777 0.053669 0.491785 0.146769
C 0.061336 0.876232 0.023001 0.039430
C 0.008762 0.989047 0.000000 0.002191
A 0.814896 0.014239 0.071194 0.099671
S 0.043812 0.578313 0.365827 0.012048
Y 0.117325 0.474781 0.052632 0.355263
A 0.933114 0.012061 0.035088 0.019737
G 0.005488 0.000000 0.991218 0.003293
R 0.365532 0.003293 0.621295 0.009879
K 0.059276 0.013172 0.553238 0.374314
G 0.013187 0.000000 0.978022 0.008791
G 0.061538 0.008791 0.851648 0.078022
C 0.114411 0.806381 0.005501 0.073707
R 0.409241 0.014301 0.557756 0.018702
S 0.090308 0.530837 0.338106 0.040749
Y 0.128855 0.354626 0.080396 0.436123
V 0.442731 0.199339 0.292952 0.064978

I get this error message:

    Traceback (most recent call last):
  File "/Users/luca/anaconda3/envs/gimme/bin/gimme", line 513, in <module>
    args.func(args)
  File "/Users/luca/anaconda3/envs/gimme/lib/python3.6/site-packages/gimmemotifs/commands/pwmscan.py", line 170, in pwmscan
    normalize=args.zscore,
  File "/Users/luca/anaconda3/envs/gimme/lib/python3.6/site-packages/gimmemotifs/commands/pwmscan.py", line 113, in command_scan
    fa = as_fasta(inputfile, genome)
  File "/Users/luca/anaconda3/envs/gimme/lib/python3.6/site-packages/gimmemotifs/utils.py", line 613, in as_fasta
    genome.track2fasta(seqs, tmpfa.name) 
  File "/Users/luca/anaconda3/envs/gimme/lib/python3.6/site-packages/genomepy/functions.py", line 466, in track2fasta
    track_type = get_track_type(track)
  File "/Users/luca/anaconda3/envs/gimme/lib/python3.6/site-packages/genomepy/functions.py", line 231, in get_track_type
    if isinstance(track, []):
TypeError: isinstance() arg 2 must be a type or tuple of types

Do you have any suggestions to sort this out?

ADD REPLYlink modified 12 days ago • written 12 days ago by ste.lu50

Hello lu, your bed file doesn't look like a standard bed format, you can check the standard bed format on https://genome.ucsc.edu/FAQ/FAQformat.html#format1, also you can see the example of how to use gimme scan on https://gimmemotifs.readthedocs.io/en/master/tutorials.html#scan-for-known-motifs

ADD REPLYlink written 5 days ago by yztxwd10

Hello yztxwd, Thanks for your suggestions

ADD REPLYlink written 21 hours ago by ste.lu50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2498 users visited in the last hour