Reaching config nirvana in Clojure with Integrant and Aero

Kasper Gałkowski

Published on 2022-04-28

1. What is configuration?
2. What's the point of it?
3. Our configuration considerations
- 3.1. Environment variables
- 3.2. EDN files and components
4. Reaching nirvana
5. Final words

Pixelated Noise is a software consultancy and we're always looking for interesting projects to help out with. If you have unmet software development needs, we would be happy to hear from you.

This post is a short exploration of the nature of configuration and an explanation on how (and why) we configured our back end server by combining the Integrant component library with the Aero configuration library.

1. What is configuration?

Most software has some kind of configuration. It comes in many forms: command line flags, files (INI, JSON, TOML, YAML, SQLite databases), environment variables, Windows registry entries. You could even be pulling stuff from Zookeeper.

Configuration has the benefit of enabling users to conveniently alter program behavior without having to recompile it. For example, you could change the screen resolution of a video game, or change the proxy server that a web browser uses to connect to web sites.

(Of course you could do binary hacks but that's usually not convenient nor traceable.)

If a program has little or no configuration, it will be less flexible to its users, but will be very easy to grok as a single, independent unit. If it has a lot of it, it will be very adaptable but harder to understand and debug - and will probably require more of either programmers, documentation and/or automatic tests to maintain.

You can see a kind of program flexibility spectrum here:

Coffee machine -> Apache httpd -> GCC

The coffee machine program is written and tested on a single chip, which is known in advance. It has one job: to control the coffee machine's hardware to make coffee. The hardware doesn't change, and even replacement parts are identical.

You might think of the program that controls an elevator in a similiar way. It's probably running for 30 years and requires minimal configuration. It's not like there's a new floor in the building every week.
Apache httpd is an HTTP server. The same program can be used to serve a small website with a dozen files, or serve a complex setup with load balancers, compression and reverse proxies. It has to be able to do both and adapt itself via configuration.

If you think about just the HTTP part, then it's still a relatively simple job, because it's not that huge of a standard. It's in reach of a single programmer to write such a server.
GCC is a compiler suite. It has frontends for a number of languages with common compiler flags between them. They have to run on a bunch of processors, so must adapt to all kinds of environments and use cases, and keep working.

The GCC tools are an extreme case. It's unlikely to have to write programs that need so much flexibility. It requires a team of experts and/or decades of work to do.

Another example of a highly flexible program would be GNU Emacs.

So, unless you're programming a coffee machine or a cross-platform compiler suite, your program is likely to fall somewhere in between this spectrum.

2. What's the point of it?

You can see now that the point of configuration is to make a program adapt to different environments and usages.

An environment might be a chip (Intel, AMD, ARM), an OS (MacOS, Windows, GNU/Linux, GNU Emacs), or a network machine (development, staging, production). It kind of depends on what the program does.

If you dig deep, an environment might also mean different temporary situations that the program finds itself in. Maybe the database is down - perhaps it is currently under heavy load and started rejecting new connections. Or, the Internet connection is down because the user walked into an elevator or is driving through some distant mountain tunnel.

A robust program would want to survive such situations.

Usage might be anything that the program can do. For example, Apache can be an HTTP server, load balancer and reverse proxy. A coffee machine can make either Espresso or Americano. GCC can compile and link programs written in different languages, with different optimization and/or debugging options applied.

If it sounds interesting, I recommend peeking into this book. The authors explain flexibility on the example of biological organisms. Don't worry, I won't spoil you the rest of the fun.

3. Our configuration considerations

Our back end server is a REST API using HTTP, running on the JVM. It needs to have the following things configurable:

Address and credentials of a remote PostgreSQL database

It's needed so that application data can be stored and queried using SQL.
TCP port to listen for HTTP requests on.

It's nice to be able to change it, for example when there are port conflicts between different processes.
Authentication method for secure REST API access.

Because the server is running on the public Internet, it needs some form of authentication to protect its API.

You could have one method for offline/disabled authentication for when you need to work on a plane (or just don't want the distraction of the Internet during development), and another one that uses something like Okta for production.
This last point leads us to a much more general requirement: we would like to be able to mix and match the different components that make up our application depending on the environment and deployment mode. So in the previous point, we would not be configuring the same auth component to either allow all (for development) or perform actual auth (for prod). Instead, we would like to have 2 separate components for auth, one dummy and one real, and we should be able to swap one for the other.

3.1. Environment variables

Since we're on the JVM, we could use properties for configuration via System.getProperty. But that won't do, because deploying to Heroku (which is free and suitable for a small project) means we have to accept Unix environment variables because that's how we get passed things like the database server's IP, port, credentials etc. So we'd have to implement something using System.getenv.

One problem with environment variables is that they are always strings, so if you have a setting that is best represented as a number, you'd have to do some parsing. They are also flat by definition, so there is no way of grouping them together other than using naming conventions (DATABASE_IP, DATABASE_PORT etc). The lack of structure can be quite restrictive, it would be challenging to pass a list of IP addresses for example.

3.2. EDN files and components

An alternative is to store the settings in an EDN, which can be externally modified and then re-read by the program at startup time. This solves the lack of types and structure described in the previous section. If just an EDN file is enough for you, you can make do with slurp and clojure.edn/read-string. But using an simple EDN file doesn't address consuming environment variables (which we are forced to interact with because of Heroku).

Before we address environment variables, let's have a look at components in Clojure. This particular flavour of dependency injection/component lifecycle/state management was first introduced in Clojure via the Component library, and numerous implementations since. We have decided to go with Integrant for this project. The nice thing about integrant is that you can use an EDN file to describe both the configuration of each component and also the dependencies between components in a declarative way:

{:http/service {:port 8080
                :db   #ig/ref :db/pg
                :auth #ig/ref :auth/dummy}
 :auth/dummy   {}
 :db/pg        {:jdbc-url          "jdbc:postgresql://192.168.1.1:5432/company"
                :database-name     "company"
                :username          "admin"
                :password          "secret"
                :minimum-idle      3
                :maximum-pool-size 15}}

As you can imagine, you can have different Integrant files for different deployments, so that the test environment config uses fewer resources, or even replaces a whole component (such as auth) with a mocked implementation.

But don't forget that we also need to deal with environment variables, ideally without losing the benefits of the EDN-based Integrant configuration. Aero is a config library that makes extensive use of tagged literals to provide functionality for parsing environment variables, using them conditionally, templating them in strings etc (the scope of Aero goes beyond environment variables). So a fragment of Aero config could look a bit like this:

{:url #join ["jdbc:postgresql://psq-prod/prod?user="
             #env PROD_USER
             "&password="
             #env PROD_PASSWD]}

4. Reaching nirvana

You may have noticed that both libraries use tagged literals to extend the EDN format (it's an Extensible Data Notation after all!). But we'd like to be able to use the tagged literals of both libraries in our EDN files and have them be parseable. Luckily, Aero allows us to extend its notation using a multimethod, so we can teach it about integrant.core/ref:

(defmethod aero.core/reader 'ig/ref
  [{:keys [profile] :as opts} tag value]
  (integrant.core/ref value))

Now we can mix Aero and Integrant stuff in the same file:

{:http/service {:port #env PORT
                :db   #ig/ref :db/pg
                :auth #ig/ref :auth/dummy}
 :auth/dummy   {}
 :db/pg        {:jdbc-url          #env DATABASE_URL
                :database-name     "company"
                :username          "admin"
                :password          "secret"
                :minimum-idle      3
                :maximum-pool-size 15}}

This file can now be loaded with aero.core/read-config.

This makes it possible to configure:

PostgreSQL details using the DATABASE_URL environment variable from Heroku
HTTP port using the PORT environment variable from Heroku

But most importantly, this kind of setup allows us to swap components without changing our code:

{:http/service {:port #env PORT
                :db   #ig/ref :db/pg
                :auth #ig/ref :auth/okta}

 ;; Swapped out!
 :auth/okta    {:client-id #env OKTA_CLIENT_ID}
 :db/pg        {:jdbc-url          #env DATABASE_URL
                :database-name     "company"
                :username          "admin"
                :password          "secret"
                :minimum-idle      3
                :maximum-pool-size 15}}

The :auth/dummy and :auth/okta integrant keys dispatch to different multimethods in our code and are implemented in different namespaces.

5. Final words

That's all on the topic of configuration.

By combining Integrant and Aero we were able to come up with a solution that results in a pretty readable config file, at the cost of depending on more libraries.