I've created a monster and it has come
back to eat my brain. I've made several blog posts about Configman,
my universal configuration manager that encapsulates command line,
configuration file and environment configuration systems. It is a
powerful system that gave Socorro a flexible dependency injection
framework. It has enabled us to swap out storage schemes and
processing algorithms using configuration.
In Socorro, we've chosen to use INI
files for configuration. Configman is able to create the canonical
INI file for any app that employs Configman. Applications are
comprised of components that declare what external resources they
need. For example, a processor may need an HBase crash storage
source, an HBase crash storage destination and a RabbitMQ queue. The
processor code for each of these three components declare their needs
in a Configman compatible manner. In turn, Configman will create an
INI file for the processor that has three sections: source,
destination and queue. Within each of these sections will be the
configuration requirements for the external resources:
[source] storage_class=socorro.external.hb.crashstorage.HBaseCrashStorage host=localhost port=9090 [destination] storage_class=socorro.external.hb.crashstorage.HBaseCrashStorage host=localhost port=9090 [queue] queue_class=socorro.external.rabbitmq.new_crash_source host=rabbitmqHost user=rabbitmqUser password=rabbitmqPassword
Notice that the source and destination
sections both have the same requirements. It is inconvenient to have
to specify the HBase connection information twice. To solve that
problem, we've chosen to extend the INI file syntax with an +include
directive:
[source] +include common_hbase.ini [destination] +include common_hbase.ini [queue] queue_class=socorro.external.rabbitmq.new_crash_source host=rabbitmqHost user=rabbitmqUser password=rabbitmqPassword
Then we create the file
common_hbase.ini with the HBase connection requirements and the
information only has to be specified once.
This works great until some other
component needs the some of the same information, but not all of it
from the common_hbase.ini file. We cannot use the +include in that
case because bringing extra symbols into the a section is an error as
far as Configman is concerned. To get around
this problem, we relaxed the requirements to allow unknown symbols in
sections. Unfortunately, this immediately sacrifices important error
detection: misspell a symbol and configman won't know if it is
misspelled or just unused. This is not ideal.
The system of +include also enables
multiple applications to share some configuration information. The
processor and the crashmover both need to talk to HBase, so we could
use one common_hbase.ini file for both applications. That works fine
until one application needs different values for one or more of
the parameters defined in the include file. This is the case in our
production environment, where some applications use a different user
names to connect with the same resource. We could factor the
variable parameters back out of the +include file, or make nested
+include files. As we get into it, however, we end up adding a whole
new layer of complexity that is hard to manage.
Here is a proposal for getting around
the problem. I'm going to mandate that all INI files have a
[resource] section. Within that section, each external resource
will have its own subsection. Configman will create this resource
section automatically when it reads the resource requirements from
the loaded application components.
[resources] [[hbase]] storage_class=socorro.external.hb.crashstorage.HBaseCrashStorage host=localhost port=9090 [[rabbitmq]] queue_class=socorro.external.rabbitmq.new_crash_source host=rabbitmqHost user=rabbitmqUser password=rabbitmqPassword [source] # storage_class -> resources.hbase.storage_class # storage_class= # host -> resources.hbase. host # host= # port -> resources.hbase. port # port= [destination] # storage_class -> resources.hbase.storage_class # storage_class= # host -> resources.hbase. host # host= # port -> resources.hbase. port # port= [queue] # storage_class -> resources.rabbitmq.storage_class # queue_class= # host -> resources.rabbitmq. host # host= # user -> resources.rabbitmq. user # user= # password -> resources.rabbitmq.password # password=
For example, the application, when it
wants its configuration value for the source storage_class, will
reference the configuration object normally:
config.source.storage_class. Behind the scenes, Configman knows that
this configuration parameter is linked to the resource section.
Configman will return the value from the resource section to the
application.
In the case where a particular service
needs a different value than the one defined in the resource section,
it may be overridden in its original location by uncommenting it and
providing an alternative value:
[resources] [[hbase]] storage_class=socorro.external.hb.crashstorage.HBaseCrashStorage host=localhost port=9090 … [source] # host -> resources.hbase. host host=192.168.1.222
This new resource system does not
preclude the use of +include files. If several applications were to
need HBase configuration, a +include common_hbase.ini could be
created and used inside the resource section:
[resources] [[hbase]] +include common_hbase.ini
The values read in from the +include
file can be overridden in the original sections, just as in the
previous example. However, because Configman employs ConfigObj for
INI file processing, an override of a given value within the same
section that has the +include is not allowed. This is a restriction
imposed by ConfigObj.
How does this resolve the problem that
we're having at Mozilla?
It consolidates the resources configs.
Configuration for an app's external resources is done in one place at
the top of the INI file for each app. We do not need to maintain the
common_*.ini include files. The configuration files for development,
staging, and production can be identical except for the resource
connection details.
But now we have to repeat the resource
connection information in the INI file for each app, isn't that less
convenient?
We can choose to use +include files,
but I discourage it. While we may be calling them 'common' files, in
our production environment they aren't really common. The processors
use a different HBase host than the middleware; the middleware uses a
different user and host for Postgres than Crontabber; etc. Coding
for exceptions to the common files is a complication.. It will be
easier to maintain configuration on an app by app basis. It
minimizes the number of configuration files and completely avoids
+includes and their inevitable exceptions.