These routes don't look flat enough to me

Published on 2023-02-09

Table of Contents

Pixelated Noise is a software consultancy and we're always looking for interesting projects to help out with. If you have unmet software development needs, we would be happy to hear from you.

Recently a colleague was looking for a good way to flatten reitit routes in order to perform some security testing on all the possible URLs of a website using ZAP.

Essentially he needed to go from this kind of data structure:

[["/yo" [["/man" :man] ["/manman" :manman]]]
 ["/foo" [] [[]]
  ["/bar" :bar]
  ["/baz" ["/quux" :quux] [["/qux0" :qux0] [["/qux1" :qux1]]]]
  ["/ba" ["/zz" ["/bazz" :bazz]] ["/baq" :baq]]]]

To something that looked like this:

{"/yo/man"         :man
 "/yo/manman"      :manman
 "/foo/bar"        :bar
 "/foo/baz/quux"   :quux
 "/foo/baz/qux0"   :qux0
 "/foo/baz/qux1"   :qux1
 "/foo/ba/zz/bazz" :bazz
 "/foo/ba/baq"     :baq}

In this article we present a few different approaches that we came up with. We start off with a couple of "vanilla" Clojure solutions, a recursive one (that doesn't use recur so in theory it could blow the stack at great depths), and an iterative solution that takes an "instruction stream" approach and simulates its own stack. We then proceed to look into a more straightforward solution using zippers and finally we combine zippers with the iteration function that was introduced in Clojure 1.11.

All functions presented here produce the same output as above so it won't be listed every time. If you'd like to follow along in your editor, you can clone the repo.

1. Core Clojure

So one approach would be to use run-of-the-mill recursive Clojure to achieve this:

(defn flatten-routes-recursive [[path-part & [sec :as rst] :as all]]
  (cond (keyword? sec) ; form 1
        {path-part sec}

        (string? path-part) ; form 2
        (update-keys (reduce merge (map flatten-routes-recursive rst))
                     (partial str path-part))

        :else ; form 3
        (reduce merge (map flatten-routes-recursive all))))

It's not a super-easy function to read, so let's break it down a bit. By looking at the input, we observe a grammar describing a tree of nodes, and each node can be have one of 3 different forms:

  1. [<string> <keyword>] this is the base case where the recursion "bottoms out", you can't go any deeper. Example: ["/qux1" :qux1]
  2. [<string> <node-0> <node-1> ... <node-n>] where the string is a prefix to all the nested routes represented as nodes to the right of the prefix. Example: ["/zz" ["/bazz" :bazz]]
  3. [<node-0> <node-1> ... <node-n>] sibling routes. Example: [["/man" :man] ["/manman" :manman]]

So let's destructure everything we might need: [path-part & [sec :as rst] :as all]

  • We're destructuring a vector
  • Its first element is path-part
  • Its second element is sec
  • All elements apart from the first one are rst
  • The whole vector is all
  • Some of the above may have a nil value

The basic building block we are going to use is a single-entry map, which is the first clause in the cond and it's also where the recursion bottoms out. These will then be combined together with reduce merge in the other clauses. The first clause matches inputs such as ["/bar" :bar] by testing that sec is a keyword (form 1). Using single-entry maps is preferrable to the - possibly more idiomatic - approach of using 2-element vectors that are then combined with into {} because we prefer to have the same type being returned by all invocations of the recursive function as it makes things more uniform. This also ensures that the function will work correctly in the case of the routes being just ["/foo" :bar] (which is valid in reitit): (flatten-routes-recursive ["/foo" :bar]) will return {"/foo" :bar}.

Getting to the second clause, we've established sec is not a keyword, but path-part is a string, so we are in form 2 territory. The rst part of our desctructuring is made of nested routes which should be flattened and merged together ((reduce merge (map flatten-routes-recursive rst))) and then the keys of the resulting map (which are strings representing route prefixes) should be further prefixed using path-part ((update-keys ... (partial str path-part))).

In the third clause we're dealing with form 3 (sibling routes) we just flatten each element of all ((map flatten-routes-recursive all)) and we just merge everything together with reduce merge.

update-keys is a function that appeared in Clojure 1.11, so for previous versions the code instead becomes:

(defn flatten-routes-pre-1:11 [[path-part & [sec :as rst] :as all]]
  (cond (keyword? sec)      {path-part sec}
        (string? path-part) (->> (reduce merge (map flatten-routes-pre-1:11 rst))
                                 (reduce-kv (fn [m k v]
                                              (assoc m (str path-part k) v))
        :else               (reduce merge (map flatten-routes-pre-1:11 all))))

But, recursive functions build up call stack frames. This is not so terrible, as allocation is normally much faster on the stack than in the heap and in this case the stack grows by O(path-depth), which shouldn’t be large.

2. Make it iterative

Just for fun though, let’s pursue this a bit further. To turn a recursive process to an iterative one, we’ll have to introduce auxiliary variables to hold the state of the computation, effectively introducing our own "stack":

(defn flatten-routes-iter
  ([x] (flatten-routes-iter {} [] x))
  ([routes path [fst & rst :as returns]]
   (cond (empty? returns) routes                                               ;1
         (keyword? fst)   (recur (assoc routes (str/join path) fst) path rst)  ;2
         (string? fst)    (recur routes (conj path fst) rst)                   ;3
         (fn? fst)        (recur routes (fst path) rst)                        ;4
         :else            (let [pop-marker (when (string? (first fst)) [pop])] ;5
                            (recur routes path (concat fst pop-marker rst))))))

The traversal gradually transforms the tree into an "instruction stream", held in returns. We process each instruction (each first element of returns) as follows:

  • keyword: Add the route name (the keyword) to the routes map according to the current path (2)
  • string: Append it to the current path (3)
  • function: Apply it to the path. The only possible function is clojure.core/pop which is used to remove the last element of the path (which is a vector) (4)
  • compound node: concat the contents of the first element of returns in front of the rest of its elements. If the first element's first element is a string, we also place a pop instruction between the contents of the first element and the rest (5)
  • empty returns: we've run out of instructions, return routes (1)

The concat in (5) is the main operation that performs the flattening of the tree. What we're essentially doing is case (5) is that we take the contents of a vector (fst) and and break them free out of the vector and add them as the first elements of returns. When we encounter a string in the first position of fst, according to reitit convention, we have a new prefix and therefore we have gone down a level into the tree. So apart from adding fst's elements in front of returns, we also add the pop function as an element after the children of that particular branch so that we know where to pop the path later.

Let's walk through this with smaller input to make it a bit more manageable:

(def the-routes
   ["/hey" :hh]])

(flatten-routes-iter the-routes)

Let's print the values right after destructuring, along with the matching case in cond:

routes  {}
path    []
returns ["/yo" ["/hey" :hh]]
case 3: Initial call with empty auxiliary variables.
        We encounter a string, add it to path and recur.
routes  {}
path    ["/yo"]
returns (["/hey" :hh])
case 5: The first element is a vector with a string as
        its first element. We "flatten" the first element
        into returns and add a pop marker after the
        elements (rst is empty).
routes  {}
path    ["/yo"]
returns ("/hey" :hh #function[clojure.core/pop])
case 3: Add string to path and recur.
routes  {}
path    ["/yo" "/hey"]
returns (:hh #function[clojure.core/pop])
case 2: Found a route name. assoc path (as a joined string)
        to the routes, with the route name as the value.
routes  {"/yo/hey" :hh}
path    ["/yo" "/hey"]
returns (#function[clojure.core/pop])
case 4: Pop marker found. pop the path and recur.
routes  {"/yo/hey" :hh}
path    ["/yo"]
returns nil
case 1: Ran out of returns. Stop recursion and return routes.

One thing to note about returns is that in essence, it mostly functions as a pointer stack, therefore it's not as wasteful as it may seem. In our terms (Clojure doesn't really support pointers), it shouldn't be difficult to imagine an equivalent implementation where :returns contains get-in paths (like [0 2 1]) into tree nodes.

So, we have an explicit stack. We also have something like an environment, as the path determines the scope of path chunks, much like bindings do for symbols. Cue the meme:


3. Have your functions and eat them

You can insist on a functional approach, but sometimes sprinkling Clojure with a tiny bit of imperative thinking makes things much simpler. But with zippers you can still maintain functional purity while providing you with a paradigm that feels somehow imperative - so you can have your cake and eat it.

Let's have a look of how we would approach our problem with zippers:

(require '[ :as z]
         '[clojure.string :as str])

(defn flatten-routes-zipper [routes]
  (loop [res {}
         z   (z/vector-zip routes)]
    (cond (z/end? z)

          (keyword? (z/node z))
          (recur (assoc res
                        (->> z z/path (map first) (filter string?) str/join)
                        (z/node z))
                 (z/next z))

          (recur res (z/next z)))))

We define a standard vector zipper that will allow us to traverse the nested vector structure. The first clause just returns our accumulator res. The last clause advances the zipper to the next position using z/next and recurs.

The middle clause matches when the value pointed to by the zipper ((z/node z)) is a keyword and this is where all the work happens. In this case, we assoc the value pointed to by the zipper into res. In order to calculate the path of string prefixes down to the current position we do the following:

(->> z z/path (map first) (filter string?) str/join)

z/path returns all the nodes down to the current position, and then we take the first element of each. Because empty vectors and grouping of routes ([["/man" :man] ["/manman" :manman]]) are allowed in reitit, we then need to filter for strings in the path. Finally we concatenate the strings together with str/join.

3.1. Clojure is an oasis of composability in a world of broken promises

The loop/recur in the previous example is imperative and its use can sometimes be a code smell, especially among Clojure beginners who seek to reproduce their beloved for-loops in an unfamiliar functional landscape. But unfortunately, with zippers, loop/recur often becomes necessary. The main problem with loop/recur is the micromanagement of "state" that has to be threaded through each call.

Clojure 1.11 includes the iteration function that turns out to fit well with zippers. Borrowing some phrasing from the official JIRA ticket, the iteration function was designed to "represent iteration e.g. HTTP endpoints that return continuation tokens". So iteration helps you cosume multiple pages of data from an HTTP endpoint when each page is accessible via a continuation token returned as part of the previous call. But what is a zipper if not a token that contains information on how to get the next element?

So let's use iteration to declutter the "state" out of our previous loop/recur solution:

(defn flatten-routes-iteration [routes]
  (->> (iteration identity
                  :initk (z/vector-zip routes)
                  :kf    z/next
                  :somef (complement z/end?)
                  :vf    #(when (-> % z/node keyword?)
                            [(->> % z/path (map first) (filter string?) str/join)
                             (z/node %)]))
       (into {})))
  • The first argument (step) which defines how to produce the next value based on the continuation token has to be identity because we would like to keep a reference to the zipper during the whole iteration.
  • :initk initializes the zipper.
  • :kf uses z/next to move to the next location.
  • :somef decides whether there are more values to be consumed, so it's the opposite of z/end?.
  • :vf produces the actual value based on the zipper, very similarly to how it's done in flatten-routes-zipper. The main difference is that in this case we produce a 2-element vector (or nil) which is then collected into a map with into {}.

This version has the advantage of making the various parts of the iteration very explicit and you could easily imagine abstracting out of it a reusable utility function for zipper iteration:

(defn zipper-iteration [zipper vf]
  (iteration identity
             :initk zipper
             :kf    z/next
             :somef (complement z/end?)
             :vf    vf))

(defn flatten-routes-generic [routes]
  (into {}
         (z/vector-zip routes)
         #(when (-> % z/node keyword?)
            [(->> % z/path (map first) (filter string?) str/join) (z/node %)]))))

Clojure's evolution (both in the core and in community libraries) has provided us with many opportunities of delightful composability which is so rare to find in other ecosystems. The combinations of zippers with the iteration function is another example of that power that comes from composability that often goes beyond the original intentions of the authors of the libraries.

4. Conclusion

Once the essential grammar of the initial input has been worked out, the initial recursive solution is the most "economical", in the sense that it uses core Clojure functions to get to the essence of what we are trying to achieve. The iterative solution borrows a leaf out of Forth's book, essentially being a tiny virtual machine that walks the input tree (program) while maintaining one stack for the path components (environment) and one for return addresses (control flow). Finally zippers possibly produce the most comprehensible code, but that also depends on who's doing the reading!

The iterative one is quite an involved solution, but notice that it shares an interesting similarity with zippers: in the case of zippers, we also have a complex data structure that allows us to navigate the input, albeit in all directions, while our iterative solution allows us to move only forwards. The similarity extends to having to attach functions to the navigation data structure: we saw the use of the pop function as a marker in our iterative function, and zippers attach functions to the navigation structure via metadata.

5. Bonus material

Just a quick note on workflow: during the development of the code for the functions presented here, we kept all the functions in the same namespace, and at the bottom of the file we had this expression:

 (= (flatten-routes-recursive the-routes)
    (flatten-routes-pre-1:11 the-routes)
    (flatten-routes-iter the-routes)
    (flatten-routes-zipper the-routes)
    (flatten-routes-iteration the-routes)
    (flatten-routes-generic the-routes)))

In this way we were able to get quick feedback about the equivalence of all the implementations (at least for the specific input) every time we reloaded the namespace. Of course there are fancier ways to achieve the same with unit test runners and such, but for a small example project we thought it was a neat lightweight workflow trick.