Question: Identification Of Promoters From Gene Sequences And A Vertebrate Genome Assembly
I have a number of genes which I'd like to locate in a whole-genome assembly, in order to identify the genes' promoters. In the past I've done this manually using BLAST and BLAT, which has worked well, but the process is quite time-intensive (because I have to take into account introns and exons). Now I find myself in a situation where I need to find the promoters for 100s of genes.

Is there a pipeline or tool which will, given gene sequences and allowing for introns, allow me to identify automate the detection of gene promoters? I'd rather not reinvent the wheel.

Where did you get your gene sequences from in the first place? Were they derived from a gene list or some other experimental discovery process?

Hi David,

There are couple of them that might be useful to you.

  • The GENSCAN Web Server at MIT
  • Gene Finder: To predict putative internal protein coding exons in genomic DNA sequences
  • Glimmer: a system for finding genes in microbial DNA, especially the genomes of bacteria, archaea, and viruses.
  • In the end a meta list of gene prediction tools can be found on this page:
Gene Prediction
ORF Finder     Search for open reading frame, at NCBI
FGENEH         Splice sites, protein coding exons and Gene models construction, promotor and poly-A search
Gene Finder    Predict protein coding exons in genomic DNA
GeneID         Gene identification and structure prediction
GeneMark       Inhomogeneous Markov model approach combined with training datasets to predict genes
GeneParser2    Identification of protein coding regions
Generation     Microbial gene prediction
Genie          Gene finder based on hidden Markov models
GenLang        Linguistics based method to find genes
GenScan        Identification of gene structures in genomic DNA
GenViewer      Predicting and analysis of protien-coding gene structures
Glimmer        A system for finding genes in microbial DNA
Grail          DNA sequence analysis tool
HMMGene        Prediction of vertebrate and C. elegans genes
NetGene2       Neural network predictions of splice sites
Procrustus     Gene recognition via spliced alignment
Wise2          Intelligent algorithm for DNA searches
WebGene        Several tools for prediction and analysis of protein coding gene structures
Xgrail         Find exons and other features
SpliceSite Prediction     Splice site prediction by neural network

I hope this helps.

If you can isolate a number of nucleotides upstream the coding region of each gene into a fast file you can use MEME to find potential reg elements.


