Question: Find Transcription factor binding site for a gene list
0
gravatar for salvatore.raieli2
14 months ago by
salvatore.raieli260 wrote:

Hi everyone,

I have a list of genes that are different expressed between tumor and normal tissue. I have two drugs that block two different transcriptional factor (let's say TF1 and TF2). What I want to do is to find which of these genes have in their promoter a TF binding site for TF1, for TF2 and for both.

For a similar task with a singular gene I did this:

library("TFBSTools")
library("JASPAR2018")
library("Biostrings")
#to retrieve the matrix for a specific tf in thre jaspar database
opts <- list()
opts[["species"]] <- 9606
opts[["name"]] <- "MYC"
PFMatrixList <- getMatrixSet(JASPAR2018, opts)
pwm <- toPWM(PFMatrixList, pseudocounts=0.8) #generation of PWM matrix for MYC
seq1 <- read.delim(...)  #the sequence of a gene with the promoter I downloaded from NCBI
subject <- DNAString(seq1) #making as DNA string
#finding the site in the sequence
siteset <- searchSeq(pwm, subject, seqname="seq1", min.score="60%", strand="*")

I have around 100 genes I think that this approach is impraticable.

Thank you in advance

Salvo

promoter tfbs R • 1.5k views
ADD COMMENTlink modified 12 months ago by Praneet Chaturvedi120 • written 14 months ago by salvatore.raieli260
1

First of all, what you aim to find is a transcription factor motif. If a motif is also a binding site needs to be determined by experiment, because not every motif automatically means that a TF will bind there. For example, a motif in heterochromatin (and there a thousands of "unused" motifs for every factor across the genome) is unlikely to ba bound by a TF. You should have a look at FIMO from the MEME suite. It takes as input a fasta file with sequences, e.g. your promoter sequences, and a motif position frequency matrix, e.g. from JASPAR, and then scans the sequences for motif occurrence, outputting a GFF file.

ADD REPLYlink modified 14 months ago • written 14 months ago by ATpoint26k

thank you, I will give a look to FIMO. I am not sure it is what I need. If I understand well I have to provide the FASTA for each gene I want to scan, right? Meaning that I have to manually download 100 genes and their promoters in the FASTA format. With FIMO can I automize this process?

ADD REPLYlink written 14 months ago by salvatore.raieli260
1

You can pass a multifasta to FIMO, in the form:

>chr1:1-100
ATAGCTACG(...)
>chr2:1:1000
ATGGACTA(...)

Export the list of genomic coordinates to disk, and then use bedtools getfasta to make the multifasta. This can be fed into FIMO, which is fully automated from this point on.

ADD REPLYlink written 14 months ago by ATpoint26k

If I understood well, I get the genomic coordinates for my list of genes and with bedtools getfasta I obtain the fasta, I take all the fasta in one multifasta that I feed to FIMO, right?

ADD REPLYlink written 14 months ago by salvatore.raieli260
1

If you feed a BED file with all the coordinates you have (your promoter regions), then bedtools will produce a multifasta file, each line representing one line of coordinates:

$ cat test.bed 
chr1    100000  100010
chr2    200000  200020

$ bedtools getfasta -fi hg38_noALT_withDecoy.fa -bed test.bed 
>chr1:100000-100010
ACTAAGCACA
>chr2:200000-200020
GTCTTAATATATACATAGGT

This multifa you feed into FIMO.

ADD REPLYlink modified 14 months ago • written 14 months ago by ATpoint26k

how I can generate a BED file just for the promoter regions of a list of genes?

ADD REPLYlink written 14 months ago by salvatore.raieli260
1

By querying a database of promoter regions if available, and if not available, creating one.

ADD REPLYlink written 14 months ago by RamRS24k
1

For simplicity, you could take a window upstream of your genes, say 250bp.

ADD REPLYlink modified 14 months ago by WouterDeCoster42k • written 14 months ago by ATpoint26k

I was thinking something similar if I am not finding a good database

ADD REPLYlink written 14 months ago by salvatore.raieli260

I think you do not need a database. A promoter is always upstream of the first exon. When you do ATAC-seq (scan for open chromatin), a typical peak from a nucleosome-free region is hardly larger than 200bp. For motif scanning, in order to avoid matches by change, you anyway should limit the size of the regions you scan. I would go for e.g. 50bp downstream and 200bp upstream of the first exon, resulting in the 250bp, and then run the analysis. See if it works out, which I think it should.

ADD REPLYlink written 14 months ago by ATpoint26k
0
gravatar for Praneet Chaturvedi
12 months ago by
Cincinnati Children's Hospital and Medical Center
Praneet Chaturvedi120 wrote:

You can use Homer's capability to get TF binding sites in the promoter regions of genes of interest

There are some databases as well:

  1. TF2DNA [Allows batch search]
  2. CISBP [requires sequences of promoters]

Cheers !!

ADD COMMENTlink modified 12 months ago by RamRS24k • written 12 months ago by Praneet Chaturvedi120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1694 users visited in the last hour