Question: regulatory regions discovery and alignment
5.8 years ago by
elb200 wrote:

Hi guys!

I have the sequence (cDNA) of the "x" expressed gene. I would like to discover putative regulatory regions (ex: the promoter region) upstream to the expressed sequence but I have not the DNA sequence upstream to the expressed ones. I have two questions: how can I get the DNA sequence that may contain predicted regulatory elements and how can I predict computationally (through for example an alignment) the presence of putative regulatory motifs or generally speaking regulatory sequences? I'm new in this field.


Thank you very much



Best regards.

5.8 years ago by
Manvendra Singh2.1k
Berlin, Germany
Manvendra Singh2.1k wrote:

if you have the co-ordinates of your gene e.g. your genes are in .bed  format then fetch their upstream sequences with a simple awk command:

if 4th coloumn is strand in your file

awk '{ if ($4=="+") print $1,$2-5000,$2; else print $1,$3,$3+5000}' OFS="\t" genes.bed > upstream.bed

now either you can look for ChIP-seq data in you cell line of interest and see if the peaks are there in your region of interest by a simple

intersectBed -a peaks.bed -b upstream.bed -f 0.5 


if you want to make predictions then you can fetch the sequences from genome

fastaFromBed is powerful tool to do so

and then just allign your sequences and run meme motif scan, once you get the motifs, compare the matrix with the JASPAR or TRANSFAC ones.



Thank you very much for your help!

