Hi community,
maybe someone can provide some guidance/insight and help me out.
Our group has identified a set of genes that are potential targets of a known transcription factor and we want to find a binding motif for this TF proximal to those genes. I have tried following this Bioconductor workflow:
Finding Candidate Binding Sites for Known Transcription Factors via Sequence Matching
However, since this workflow works with S.cerevisiae data and our data is from human cells, I encountered the following issue: In the workflow, there is this code line
orfs <- as.character(mget(genes, org.Sc.sgdCOMMON2ORF))
In the org.Hs.eg.db
data there is no "COMMON2ORF" however.
I am now wondering what the workaround is to continue with the workflow with the human data.
I thought to use org.Hs.egCHRLOC
to get the start positions for each gene of interest, however, when I provide a genes
vector like in the workflow, I always get an error saying: error in .chekKeys(value, Lkeys(x), x@ifnotfound): "value for "GENE" not found"
. I think this is due to the presence of multiple transcripts for each gene?
Does anyone have seen this problem before and found a solution or can someone point me toward a solution? Would be highly appreciated!