Question

Identify genes near occurrences of a TF binding sequence

0

Entering edit mode

7.9 years ago

codybj • 0

Hello, I have a consensus binding sequence that has been defined for a certain transcription factor. I would like to search for matches to this consensus sequence in the human genome and then in some way export or make a list of gene symbols / GI numbers / some identifiers for the gene nearest to the putative TF binding hit. In this way I am hoping to create a preliminary list of genes that may be affected by this transcription factor.

I tried to use BLAST but I can't figure out how to get from the results (i.e. nucleotide coordinates on a chromosome assembly) to identifying the nearest named gene in an automated fashion.

I'd appreciate any guidance and I apologize if this is a foolish question!

BLAST genome regulatory promoter • 2.1k views

ADD COMMENT • link updated 7.9 years ago by Ar ★ 1.1k • written 7.9 years ago by codybj • 0

0

Entering edit mode

Try BLAST/BLAT from Ensembl. From the results page, click on the links under 'Genomic Location' to visualise the neighbouring genes, or check the column 'Overlapping gene(s)'. Check the 'Ensembl BLAST and BLAT tools' video for more details on how to run the online interface and explore the results.

ADD REPLY • link 7.9 years ago by Denise CS ★ 5.2k

0

Entering edit mode

I have similar problem with you. If you have any success solving this problem, I would like to know your suggestion.

ADD REPLY • link 7.9 years ago by bharata1803 ▴ 560

score 0 · Answer 1 · 2016-06-12

0

Entering edit mode

7.9 years ago

Ar ★ 1.1k

Use FIMO. You need to upload the motifs and choose the database of your choice. It will give you the position and then using the position you need to map to the nearest gene.

ADD COMMENT • link 7.9 years ago by Ar ★ 1.1k

0

Entering edit mode

Ar, thanks for your help. It's that second part that I was actually having a problem with... How can I programmatically map these positions to the nearest gene and produce a list of gene symbols or IDs?

ADD REPLY • link 7.9 years ago by codybj • 0

score 0 · Answer 2 · 2016-06-13

0

Entering edit mode

7.9 years ago

Ar ★ 1.1k

Fastest way to do it is by using GREAT. You need to provide the genomic location in the bed format. Another way is by

Using the genomic location of all the genes
Subtracting to all the genomic loci of the TF binding site
Then find the smallest distance.

However, be cautious about the positive and negative strand of the genes, in terms of what is a TSS and TSE.

ADD COMMENT • link 7.9 years ago by Ar ★ 1.1k

0

Entering edit mode

GREAT looks great, and I've decided to give it a go using a region that Ensembl annotates as a CTCF Binding Site on GRCh37 (i.e chr17 62225957 62226356). GREAT does seem to report that the TEX2 falls in that region of the human chromosome 17 (results). But it took me a little while to find out where the results were. The 'no terms' next to GO made me think the job did not give any results. One needs to click on 'Job description' to get the list of genes mapped to the coordinates used as input. I did not find a way to run the job on GRCh38 though and one should use the format chr17 62225957 62226356 (17 62225957 62226356) does not work.

ADD REPLY • link 7.9 years ago by Denise CS ★ 5.2k