Find the genomic position of CTCF binding site on DMPK gene
3
0
Entering edit mode
2.8 years ago
kspata ▴ 70

Hi All,

I am trying to find the Genomics position coordinates for CTCF binding sites on DMPK gene. I have obtained the entire human genomic sequence of DMPK gene at the following link:

https://useast.ensembl.org/Homo_sapiens/Gene/Sequence?g=ENSG00000104936;r=19:45769717-45782552

From this paper in the link:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5339342/

The DM1 CTG repeat is located in a 3.5 kb CpG island, with two putative CCCTC-binding factor (CTCF) sites flanking the CTG repeat (Figure 1). Binding of CTCF to the CTCF binding sites, together with the DM1 CTG repeat, was suggested to establish an insulator element between the DMPK promoter and the six homeobox 5 (SIX5 [MIM: 600963]) enhancer.

How can I obtain the genomic positions for the two CTCF binding sites (CTCF 1 and CTCF 2) for hg38 genome assembly?

Your help will be appreciated.

gene genome UCSC Genome Browser • 902 views
1
Entering edit mode
2.8 years ago

It seems like the initial report of those flanking CTCF sites are from this paper:

"Recent studies have identified binding sites for CTCF as essential components of vertebrate insulator elements [21,22,23,24]. CTCF-binding sites are about 50 bp and variable, possibly because CTCF can use different subsets of its zinc-fingers to recognize diverse DNA sequences26,27. We therefore used gel mobility-shift assays to identify CTCF-binding sites. We used 10 overlapping fragments spanning the 1.3-kb region, 150 bp upstream of the CTG repeat to the major transcription start site of SIX5 (Fig. 2a), in gel mobility-shift assays with the in vitro-translated DNA-binding domain of CTCF (11ZF, the complete 11 zinc-finger DNA binding domain of CTCF; Fig. 2b), which has the same sequence specificity as full-length CTCF [28]"

So I'm not clear you could pinpoint those in a genome reference. You would likely have to look at experimental data (ChIP-Seq) and run a some motif finding tools (see the MEME Suite, or HOMER).

In case this is useful to you in exploring this problem:

The CTC repeat location is flanked by the two CTCF binding sides you're looking for. From you paper, it appears that region is characterized by Figure 1 (The DM1 Locus), where you can see the location of the flanking binding sites. More specifically:

The CTG repeat is located in the 3' UTR and SIX promoter, of which part of the DNA sequence is shown.

But they don't report coordinate, other than:

DNA methylation was independently determined for regions 300 bp upstream (hg19: 46,277,287–46,277,059) of the CTG repeat and 229 bp downstream (hg19: 46,276,890–46,276,767; Figure 1) by bisulfite conversion and sequencing (Figures S1D–S1F).

Curiously, those number don't add up. The first range is 229bp (not 300), while second one spans 124bp, so I'm unclear if this is a typo or I'm misinterpreting their range. Also, the start and stop coordinates are reversed (the larger coordinate should follow the smaller one). Still, those ranges on UCSC are (Hg19):

I'm not sure how much I trust these coordinates, but if you fiddle with the genome browser, you can get a view that has both the 3' UTR and the SIX promoter:

And if you look below at the brown and blue tracks, you can see the raw signal from CTCF binding-site peaks from ChIP-Seq experiments. Those are probably not cases of CDM1/DM1 though.

You can convert whatever coordinate you need using liftOver: https://genome.ucsc.edu/cgi-bin/hgLiftOver

Hope this is at least somewhat useful!

0
Entering edit mode
2.8 years ago
GenoMax 99k

0
Entering edit mode
2.8 years ago

Another option is to use CTCF calls for hg38 from analysis done by Matt Maurano's lab for Maurano et. al, Nat. Genetics 2015:

$wget --user-agent=Safari http://www.mauranolab.org/CATO/fimo/hg38.CTCF_upstream.1e-4.starch  This contains >1e-4 FIMO hits for three CTCF models ("Core", "Upstream", and "Upstream_P1") in hg38 space. Refer to the linked paper for more discussion. You can either extract the file to BED via BEDOPS unstarch: $ unstarch hg38.CTCF_upstream.1e-4.starch > hg38.CTCF_upstream.1e-4.bed


And then do set operations on hg38.CTCF_upstream.1e-4.bed.

Or you can just work with the Starch file directly in set operations with bedops or bedmap, e.g.:

$bedops -e 1 hg38.CTCF_upstream.1e-4.starch intergenicRegion.bed > answer.bed  Or: $ bedmap --echo --echo-map intergenicRegion.bed hg38.CTCF_upstream.1e-4.starch > answer.bed


In the bedops example, the file intergenicRegion.bed is a BED-formatted region of interest, e.g., it could be the genomic region "between the DMPK promoter and the six homeobox 5 (SIX5 [MIM: 600963]) enhancer". In that case, answer.bed would contain CTCF FIMO hits that overlap this intergenic region.

In the bedmap example, the file answer.bed contains the intergenic region(s) and any CTCF hits that overlap that region(s).

See bedops --help or bedmap --help for more information, or the online documentation.