Identify genes near occurrences of a TF binding sequence
2
0
Entering edit mode
7.9 years ago
codybj • 0

Hello, I have a consensus binding sequence that has been defined for a certain transcription factor. I would like to search for matches to this consensus sequence in the human genome and then in some way export or make a list of gene symbols / GI numbers / some identifiers for the gene nearest to the putative TF binding hit. In this way I am hoping to create a preliminary list of genes that may be affected by this transcription factor.

I tried to use BLAST but I can't figure out how to get from the results (i.e. nucleotide coordinates on a chromosome assembly) to identifying the nearest named gene in an automated fashion.

I'd appreciate any guidance and I apologize if this is a foolish question!

BLAST genome regulatory promoter • 2.1k views
ADD COMMENT
0
Entering edit mode

Try BLAST/BLAT from Ensembl. From the results page, click on the links under 'Genomic Location' to visualise the neighbouring genes, or check the column 'Overlapping gene(s)'. Check the 'Ensembl BLAST and BLAT tools' video for more details on how to run the online interface and explore the results.

ADD REPLY
0
Entering edit mode

I have similar problem with you. If you have any success solving this problem, I would like to know your suggestion.

ADD REPLY
0
Entering edit mode
7.9 years ago
Ar ★ 1.1k

Use FIMO. You need to upload the motifs and choose the database of your choice. It will give you the position and then using the position you need to map to the nearest gene.

ADD COMMENT
0
Entering edit mode

Ar, thanks for your help. It's that second part that I was actually having a problem with... How can I programmatically map these positions to the nearest gene and produce a list of gene symbols or IDs?

ADD REPLY
0
Entering edit mode
7.9 years ago
Ar ★ 1.1k

Fastest way to do it is by using GREAT. You need to provide the genomic location in the bed format. Another way is by

  1. Using the genomic location of all the genes
  2. Subtracting to all the genomic loci of the TF binding site
  3. Then find the smallest distance.

However, be cautious about the positive and negative strand of the genes, in terms of what is a TSS and TSE.

ADD COMMENT
0
Entering edit mode

GREAT looks great, and I've decided to give it a go using a region that Ensembl annotates as a CTCF Binding Site on GRCh37 (i.e chr17 62225957 62226356). GREAT does seem to report that the TEX2 falls in that region of the human chromosome 17 (results). But it took me a little while to find out where the results were. The 'no terms' next to GO made me think the job did not give any results. One needs to click on 'Job description' to get the list of genes mapped to the coordinates used as input. I did not find a way to run the job on GRCh38 though and one should use the format chr17 62225957 62226356 (17 62225957 62226356) does not work.

ADD REPLY

Login before adding your answer.

Traffic: 2503 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6