Question

find TF binding site based on sequence

0

Entering edit mode

2.2 years ago

c.heininger ▴ 10

Hi community,

maybe someone can provide some guidance/insight and help me out.

Our group has identified a set of genes that are potential targets of a known transcription factor and we want to find a binding motif for this TF proximal to those genes. I have tried following this Bioconductor workflow:

Finding Candidate Binding Sites for Known Transcription Factors via Sequence Matching

However, since this workflow works with S.cerevisiae data and our data is from human cells, I encountered the following issue: In the workflow, there is this code line

orfs <- as.character(mget(genes, org.Sc.sgdCOMMON2ORF))

In the org.Hs.eg.db data there is no "COMMON2ORF" however.
I am now wondering what the workaround is to continue with the workflow with the human data.

I thought to use org.Hs.egCHRLOC to get the start positions for each gene of interest, however, when I provide a genes vector like in the workflow, I always get an error saying: error in .chekKeys(value, Lkeys(x), x@ifnotfound): "value for "GENE" not found". I think this is due to the presence of multiple transcripts for each gene?

Does anyone have seen this problem before and found a solution or can someone point me toward a solution? Would be highly appreciated!

motif Bioconductor TF R • 385 views

ADD COMMENT • link 2.2 years ago by c.heininger ▴ 10