How to find out TFBS motifs within 5'-UTR sequences
6 months ago
isha.lily20

Hello Researchers,

1) can any one tell me how to fetch out only 5'-UTR sequences in fasta format ?

2) and how to find out TFBS motifs within 5'-UTR ?

species : rice transcriptome sequences

i haven't tried any thing, but thinking to use chip -seeker for finding TFBS , actually i am totally confuse bcoz i have transcriptome sequences ,

can i still take sequences from ensemble , biomart ?

3) is there any specific website for rice transcriptome sequences with 5'-UTR sequences ?

Thank u ..

TFBSmotifs 5primeUTR
6 months ago
JC

Hello,

1) you don't mention species, if your species is in Ensembl, you can use BioMart https://www.ensembl.org/biomart/ to export the 5' UTRs 2) again species, in general, there are tools to predict them, check http://molbiol-tools.ca/Transcriptional_factors.htm

Hi JC, its rice transcriptome sequences , can i still export 5'-UTR from biomart ?

6 months ago

UCSC Goldenpath offers 1kb, 2kb, and 5kb upstream sequences with annotated 5'UTR for various assemblies.

upstream1000.fa.gz - Sequences 1000 bases upstream of annotated transcription starts of RefSeq genes with annotated 5' UTRs. This file is updated regularly. It might be slightly out of sync with the RefSeq data shown on the browser, as is it updated daily for most assemblies.

upstream2000.fa.gz - Same as upstream1000, but 2000 bases.

upstream5000.fa.gz - Same as upstream1000, but 5000 bases.

For example, to download and expand the upstream1000.fa.gz file for hg38:

\$ wget -qO- "http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/upstream1000.fa.gz" | gunzip -c > upstream1000.fa


Once you have selected sequences for your assembly, and you have a transcription factor PWM database in hand (Transfac, Jaspar, etc.) you could use FIMO to search for putative binding sites within the sequences.

On the Bioinformatics SE, I posted a walkthrough the commands to use to run FIMO with a JASPAR TF database and sequences-of-interest, at a typical threshold of sensitivity:

Another toolkit you might see mentioned is HOMER, but this is for de novo motif discovery, i.e., you are looking for new or unpublished motifs.

One difference between HOMER and FIMO is that FIMO would be used for discovery of published or known motifs, for which there are existing, experimentally validated PWM databases. The functionality of HOMER would perhaps be closer to the MEME tool, which is part of the larger toolkit that FIMO is in. Like HOMER, MEME would be used for finding novel motifs.

Hi Alex Reynolds, do u kno any specific website for rice transcriptome , who offers all this u mentioned 1kb upstream sequences with annotated 5'-UTR ?

Not UCSC, but MSU keeps per-chromosome sequence and annotation files here for Japanese rice (O. sativa):

http://rice.plantbiology.msu.edu/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/pseudomolecules/version_7.0/

The GFF3 files contain gene annotations, including regions defined as five_prime_UTR. I imagine those could be used with the sequence files to generate starting input for FIMO.