Question

How to find out TFBS motifs within 5'-UTR sequences

0

Entering edit mode

2.8 years ago

isha.lily20 ▴ 10

Hello Researchers,

1) can any one tell me how to fetch out only 5'-UTR sequences in fasta format ?

2) and how to find out TFBS motifs within 5'-UTR ?

species : rice transcriptome sequences

i haven't tried any thing, but thinking to use chip -seeker for finding TFBS , actually i am totally confuse bcoz i have transcriptome sequences ,

can i still take sequences from ensemble , biomart ?

3) is there any specific website for rice transcriptome sequences with 5'-UTR sequences ?

Thank u ..

TFBSmotifs 5primeUTR • 1.4k views

ADD COMMENT • link updated 2.8 years ago by Alex Reynolds 35k • written 2.8 years ago by isha.lily20 ▴ 10

score 1 · Answer 1 · 2021-07-01

1

Entering edit mode

2.8 years ago

JC 13k

Hello,

1) you don't mention species, if your species is in Ensembl, you can use BioMart https://www.ensembl.org/biomart/ to export the 5' UTRs 2) again species, in general, there are tools to predict them, check http://molbiol-tools.ca/Transcriptional_factors.htm

ADD COMMENT • link 2.8 years ago by JC 13k

0

Entering edit mode

Hi JC, its rice transcriptome sequences , can i still export 5'-UTR from biomart ?

ADD REPLY • link 2.8 years ago by isha.lily20 ▴ 10

1

Entering edit mode

yes, go to http://plants.ensembl.org/biomart

ADD REPLY • link 2.8 years ago by JC 13k

score 1 · Answer 2 · 2021-07-01

UCSC Goldenpath offers 1kb, 2kb, and 5kb upstream sequences with annotated 5'UTR for various assemblies.

For hg38, for example, via http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/:

upstream1000.fa.gz - Sequences 1000 bases upstream of annotated transcription starts of RefSeq genes with annotated 5' UTRs. This file is updated regularly. It might be slightly out of sync with the RefSeq data shown on the browser, as is it updated daily for most assemblies.

upstream2000.fa.gz - Same as upstream1000, but 2000 bases.

upstream5000.fa.gz - Same as upstream1000, but 5000 bases.

For example, to download and expand the upstream1000.fa.gz file for hg38:

$ wget -qO- "http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/upstream1000.fa.gz" | gunzip -c > upstream1000.fa

Once you have selected sequences for your assembly, and you have a transcription factor PWM database in hand (Transfac, Jaspar, etc.) you could use FIMO to search for putative binding sites within the sequences.

On the Bioinformatics SE, I posted a walkthrough the commands to use to run FIMO with a JASPAR TF database and sequences-of-interest, at a typical threshold of sensitivity:

https://bioinformatics.stackexchange.com/questions/2467/where-to-download-jaspar-tfbs-motif-bed-file/2491#2491

Another toolkit you might see mentioned is HOMER, but this is for de novo motif discovery, i.e., you are looking for new or unpublished motifs.

One difference between HOMER and FIMO is that FIMO would be used for discovery of published or known motifs, for which there are existing, experimentally validated PWM databases. The functionality of HOMER would perhaps be closer to the MEME tool, which is part of the larger toolkit that FIMO is in. Like HOMER, MEME would be used for finding novel motifs.