I have a list of genes that are different expressed between tumor and normal tissue. I have two drugs that block two different transcriptional factor (let's say TF1 and TF2). What I want to do is to find which of these genes have in their promoter a TF binding site for TF1, for TF2 and for both.
For a similar task with a singular gene I did this:
library("TFBSTools") library("JASPAR2018") library("Biostrings") #to retrieve the matrix for a specific tf in thre jaspar database opts <- list() opts[["species"]] <- 9606 opts[["name"]] <- "MYC" PFMatrixList <- getMatrixSet(JASPAR2018, opts) pwm <- toPWM(PFMatrixList, pseudocounts=0.8) #generation of PWM matrix for MYC seq1 <- read.delim(...) #the sequence of a gene with the promoter I downloaded from NCBI subject <- DNAString(seq1) #making as DNA string #finding the site in the sequence siteset <- searchSeq(pwm, subject, seqname="seq1", min.score="60%", strand="*")
I have around 100 genes I think that this approach is impraticable.
Thank you in advance