Question

transcription binding site prediction from fasta sequence

0

Entering edit mode

20 months ago

newbio • 0

Hi all I have 2 kb long sequence which is promoter region of a gene. From that sequence I would like to pinpoint transcription factor binding sites of smad3. How can I do that and is it possible with JASPAR?

Thanks

transcription site jaspar • 849 views

ADD COMMENT • link updated 20 months ago by ATpoint 82k • written 20 months ago by newbio • 0

0

Entering edit mode

Hi, you can use python to find all the smad3 binding sites in your query sequence, for example if smad3 binding site is 5'-GTCTAGAC-3' you can use the following dirty python code to get all your sites

#usr/bin/env python3
from re import finditer
query="your 2 kb DNA sequence"
smad3_binding_site="GTCTAGAC"
for matches in finditer(smad3_binding_site,query):
    print(matches.span(), matches.group())

hope it helps.

ADD REPLY • link 20 months ago by Prosad ▴ 30

score 3 · Answer 1 · 2022-08-12

3

Entering edit mode

20 months ago

rpolicastro 13k

JASPAR allows exporting motifs in MEME format which you can use along with FIMO from the MEME suite to look for specific motif occurrences in your sequence(s). As opposed to exact string matching it takes into consideration the probability of each base occurring per-position since chromatin binding proteins can be promiscuous with their binding sites.

ADD COMMENT • link 20 months ago by rpolicastro 13k

1

Entering edit mode

There is now also a wrapper package for the MEME suite (that includes FIMO) in R/Bioconductor: https://bioconductor.org/packages/release/bioc/html/memes.html You could use the HOCOMOCO motif collection, they offer downloads of motifs directly in MEME format.

ADD REPLY • link 20 months ago by ATpoint 82k