To find what you need, ask you question in this form
In NCBI: ‘database containing essential genes of gut bacteria’
I’ve got a lot of articles.
A Comprehensive Overview of Online Resources to Identify and Predict Bacterial Essential Genes
Chong Peng1, Yan Lin1, Hao Luo1 and Feng Gao1,2,3*
They enumerate databases you have already known, but they suggest some approaches to find new ones:
Sequence Derived Features of Essential Genes
(1) GC content. DNA with high GC content is believed to be
more robust and stable (Seringhaus et al., 2006).
(2) Codon usage. The codon usage of essential genes suffers
from more evolutionary constraints than non-essential
genes (Jordan et al., 2002).
(3) Strand bias. Essential genes tend to be encoded on the
leading strand of the chromosome (Lin et al., 2010; Rocha
and Danchin, 2003).
(4) Protein length. Although protein length tends to become
longer through evolution, essential genes, compared to
non-essential genes, have a significantly higher proportion
of large and small proteins relative to medium-sized
proteins (Lipman et al., 2002; Gong et al., 2008).
(5) Z-curve parameter. The Z-curve theory is a bioinformatic
algorithm to display base composition distributions along
DNA sequences (Zhang and Zhang, 1994; Zhang, 1997;
Gao and Zhang, 2004).
All the information that a given
DNA sequence carries is included in the corresponding
Z-curve. So Z-curve features can be used as sequence
derived features for essential gene prediction (Song et al.,
2014; Lin et al., 2017). Based on the Z-curve theory,
Guo et al. (2017) created a λ-interval Z-curve, which
considered the interval range association. They then built
a support vector machine-based model to predict human
gene essentiality with the λ-interval Z-curve, and obtained
excellent performance (Guo et al., 2017).
(6) Hurst exponent. The Hurst exponent is a characteristic
parameter which describes the degree of self-similarity of
a data set. For genes of similar length, the average Hurst
exponent of essential genes is smaller than that of non-
essential genes (Zhou and Yu, 2014).
Context-Dependent Features of Essential Proteins
(1) Domain properties. Protein essentiality is not likely to be
conserved through the conservation of overall proteins
but through the function of protein domains or domain
combinations (Deng et al., 2011).
(2) Protein-protein interaction (PPI) network. Genes or their
protein products are connected rather than isolated.
Compared with non-essential genes, essential genes tend to
be more highly connected in protein interaction networks.
Network topology features, such as degree centrality (DC),
betweenness centrality (BC), closeness centrality (CC),
eigenvector centrality (EC), subgraph centrality (SC) have
been used for detecting essential proteins (Estrada, 2006;
Acencio and Lemke, 2009; Hwang et al., 2009; Wang et al.,
2013; Xiao et al., 2015).
(3) Protein localization. Essential proteins exist in cytoplasm
with a higher proportion, while locate in cell envelope
such as cytoplasm membrane, periplasm, cell wall and
extracellular with a much lower proportion compared with
non-essential proteins (Seringhaus et al., 2006; Peng and
(4) Gene expression. Genes whose expression levels are higher
and stabler under given conditions are more likely to be
essential (Jansen et al., 2002).
(5) Gene Ontology. The Gene Ontology (GO) project provides
a set of hierarchical controlled vocabularies for describing
the biological process, molecular function, and cellular
component of gene products (Ashburner et al., 2000).
GO terms related to cellular localization and biological
process are shown to be reliable predictors of essential genes
(Acencio and Lemke, 2009
See also this post:
Database Of Essential Genes
There is a lot of information inside.