Revisiting day 18 of Advent of Code 2019

Last year I’ve decided to challenge myself, I wanted to complete Advent of Code. And as if that wasn’t enough, I wanted to finish it on Christmas Day. That would mean solving each problem on the day it came out. I’m happy to say that for the most part I was successful in this, but there was a day, a single day, that made me doubt my ability to complete this challenge.

I’m talking about Day 18. I was far from the only one who had problems with this puzzle. If you look at AoC’s stats, it’s the second least solved puzzle, overtaken only by day 23.

In the end, thanks to help from AoC’s subreddit (which is full of great people who are happy to help anyone with a problem), I was able to finish both parts of day 18. I didn’t know it at the time, but turns out this puzzle required use of so called ‘Dynamic Programming’ technique.

My initial solution for this problem has the following running times:

part1 - 0.08 s
part2 - 18-20 hours

As you can see, part2 is less than ideal. So with AoC 2020 approaching, I’ve decided to revisit this problem and figure out a way to make part2 faster.

The problem

First let me introduce you to the problem that Day 18 presents.

You are given a maze represented by ascii characters, like so:

#################
#i.G..c...e..H.p#
########.########
#j.A..b...f..D.o#
########@########
#k.E..a...g..B.n#
########.########
#l.F..d...h..C.m#
#################

The lower-case letters represent keys, the upper-case letters represent locked doors and hashes represent walls. Your starting position is represented with the ‘at’ sign. Your goal is to gather all the keys in the maze with as few steps as possible, you may only move up, down, left, right, there are no diagonal movements allowed.

As you’ve probably guessed, you need to collect a key before you can go through a locked door, each letter key corresponds to its upper-case door. So if you want to get to the letter j in the example above, you would first need to collect the a key, so the A door opens up and lets you access j.

BFS isn’t enough

At first I thought “Ah, a maze, that’s basically a graph. I’ll just use BFS and all’s well!”. Sadly that was not the case.

BFS is great for finding the shortest path to a single position, however when you need to find the shortest path which goes through multiple points (without a given order of the points), it’s not enough for the task. (Not on it’s own at least.)

Using BFS I was able to find the shortest path to each key from the starting position as well as the shortest path from one key to another (while ignoring the doors). Now it was time to find the smallest sum of these paths while respecting the order in which keys need to be picked up, so doors would open and allow access to the remaining keys.

If you’re aware of the Dynamic Programming technique, you might already have a pretty good idea on how to approach this problem. However, I was not aware of this technique, so I just did a recursive function that goes through all possible valid (required doors are opened) permutation of key pick-ups and saves the one that has the shortest distance traveled. Once I ran this algorithm, I was expecting results, but it kept going. And going. And going. After a while I fired up python and made some calculations. Turns out the algorithm wouldn’t finish running any time soon.

worst case number of permutations - 26! = 403291461126605635584000000
running time assuming 1 permutation is computed in 1 ns - 403291461126605635.584 seconds
                                                        = approximately 12.8 billion years

At this point I’ve decided I don’t have time to wait for this, I need to figure out a better solution.

Dynamic Programming to the rescue!

If you’ve never heard about Dynamic Programming, I encourage you to read about it in more detail, it’s a really cool way to optimize some problems. But the basic idea is to store computed results which would otherwise be computed over and over again.

This is great for our problem because if we’re going to go through all permutations of possible orders of key pick-ups, we’d be computing a lot of things over and over again.

For a simple example, let’s have first 6 letters of the alphabet as keys. We know the distances between them. There are 120 unique ways of ordering them.

We can go through all possible permutations and get the shortest one like this:

struct KeyInfo {
    bool pickedUp; // whether the key is already picked up
    char key; // ascii representation of the key
    int distances[6]; // distances to other keys
};

size_t findShortestPath(int startingIndex, struct KeyInfo keyInfo[6]) {
    keyInfo[startingIndex].pickedUp = true;
    size_t minPath = -1; // minimal path over the remaining keys
    for(int i = 0; i < 6; i++) {
        if(!keyInfo[i].pickedUp) {
            size_t path = keyInfo[startingIndex].distances[i] + findShortestPath(i, keyInfo);
            if(path < minPath)
                minPath = path;
        }
    }
    keyInfo[startingIndex].pickedUp = false;
    if(minPath != (size_t)-1)
        return minPath;
    return 0;
}

int main(void) {
    size_t minPath = -1;
    struct KeyInfo keyinfo[6];
    getKeyInfo(keyinfo); // let's not get into this now
    for(int i = 0; i < 6; i++) {
        size_t path = findShortestPath(i, keyinfo);
        if(path < minPath)
            minPath = path;
    }
    printf("Shortest path is: %zu\n", minPath);
}

Now we’re sure to get the shortest path, but we could arrive at the same result with fewer computations.

Imagine for example that you have the following sequence of letters:

a->b->c

Now you need to compute the shortest distance starting from c and going over d, e, f. Once you compute this, you can save this result and when you get to the following situation:

b->a->c

You can use your saved value, because you’re in exactly the same situation. You’re starting from c and need to find the shortest path over d, e, f. This way you save yourself 6 computations of the shortest path. This might not seem like much now since we only have 6 letters, but if you were in this situation and instead of 6 letters you had all 26, you suddenly save yourself 25852016738884976640000 computations.

Using DP with BFS

Now that we know of Dynamic Programming, the question is how do we use it with BFS?

My idea was to create a big cache for each possible combination of collected keys. Since each key is either collected or not collected, I thought the easiest way to do this would be a cache of size \(2^{26}\) (because there are at most 26 letters) and each entry in the cache would have an array of 26 values which would represent the best path based on which key we’re standing on right now. I’m using 32 bit numbers for best path (because I don’t expect it to be longer than 4 billion), therefore allocation of this cache will take up roughly 6.5 GB of RAM. This isn’t the best way to utilize RAM, but I don’t know how to implement a better hash table in C and I have 16 GB of RAM in my PC, so it doesn’t really bother me that much. (It will bother me for part 2, just wait)

Now all we need to do is go through all the possible paths and save intermediate results into cache. Before we begin, we need to calculate lengths of paths from starting point to each key and from each key to all other keys (ignoring locked doors) and while we’re doing that we also need to figure out what keys are needed to unlock a path for given key. This isn’t really that interesting, just input parsing, so I’ll skip over that.

While we’re processing the shortest path, we have the following states:

We’re standing on a position on which we’ve already stood with the same set of keys => go to cache, get the best path for the current state, add it to already explored path
We’re in a brand new state, therefore we need to recursively call the shortestPath function for the current state and save the result to cache

If we correctly write this algorithm, we should have a blazing fast maze explorer. On my machine it took around 0.08 seconds to complete my input. Not bad compared to 13 billion years it would take without Dynamic Programming.

If you’d like a specific number, with this way of programming we only need to do the following number of calculations: \[ \sum_{i=0}^{25} \binom{26}{i} \cdot (26-i) = 872415232 \] Which is about \(4 \cdot 10^{17}\) times smaller than the original 403291461126605635584000000

Part 2, attempt 1, or - you should always check you’re determining the state correctly

Part 2’s problem is very similar to part 1. You still need to find all keys in the shortest path possible, but now instead of having 1 starting spot inside a maze, the maze is broken up into 4 quadrants which are divided by solid walls. Each quadrant has a starting spot, you move in each quadrant at the same time (as if moving 4 pieces). Piece in one quadrant can pick up a key which opens doors in another piece’s quadrant, so you can’t just independently solve 4 mazes, you need to do them at the same time.

For part 2 I’ve just slightly modified part 1’s code. I would still cache based on found keys and then save the best path based on what was the last visited key. However this time I would also need to save 3 other positions, so that I’m sure I’m in the same state as before.

I’ve made the following struct for this cache:

struct cache {
    uint16_t best_path[26]; // best path based on last visited key
    short other_positions[26][3]; // positions of other 'pieces'
};

Now all I need to do is always check that I’m in the same state, pretty simple, right?

Wrong. I’ve overlooked one crucial detail - the cache doesn’t allow for more than 1 configuration of standing pieces. I don’t know why I didn’t see it the first time I wrote this, but that happens.

As the cache is now, we will overwrite our previous values a lot. E.g. if we have keys a,b,c,d,e,f, our last position is d and the other positions are a,b,c, we would find the cache structure corresponding to our found keys and save this:

myCache[index].best_path[3] = path;
myCache[index].other_positions[3][0] = 0;
myCache[index].other_positions[3][1] = 1;
myCache[index].other_positions[3][2] = 2;

And in a different iteration, when we have keys a,b,c,d,e,f, our last position is d and the other positions are f,b,c, we would overwrite our previous path like so:

myCache[index].best_path[3] = path;
myCache[index].other_positions[3][0] = 5;
myCache[index].other_positions[3][1] = 1;
myCache[index].other_positions[3][2] = 2;

So you can see that any speed up is purely coincidental, this managed to finish my input in about 18-20 hours, but I wouldn’t be surprised if for different inputs it took weeks, months or even years.

Part 2, revisited

Now that we know where the problem was in the previous version, we can fix it. All we need to do is create a better cache sturcture, one that will differentiate paths based on the position of all pieces, not just the last one.

As I’ve said already, there’s \(2^{26}\) possibilities for what keys are currently found. We have 30 places on which our pieces can stand (26 letter and 4 starting positions). We get the largest amount of variations if we split the 30 positions between pieces as evenly as possible. This gives us \[ 8 \cdot 8 \cdot 7 \cdot 7 = 3136 \] possible states of where each piece can stand (in the worst case).

At this point I’ve decided to be lazy and just allocate all possible cache space at once and use it as I want. I ran the program and SEGFAULT. Hmm, that’s strange, let me do a quick bit of math. I have \(2^{26}\) cache rows, in each I have smaller cache with \(3136\) rows and in each of those a \(16\) bit value. Ah, 392 GB of space required. I don’t really feel like spending thousands of dollars for a new PC with enough RAM, so let’s optimize!

My first idea was to not allocate all cache at once, just allocate \(2^{26}\) pointers for the main cache (512 MB on 64bit machine) and allocate the \(3136 \cdot 16\) bit sized cache only when needed. This didn’t segfault immediately, it took a fraction of a second. We’re still too greedy with RAM :/

I will have to implement a dynamically allocated second cache, not great. I don’t really know how to create a hashing function based on currently allocated space. So I’m not going to do that, instead I will sacrifice a little bit of performance to save a lot of space. I will create a lookup table. I’ve created a struct that holds an array and number of items in the array, the array itself always contains a pair of \(\{key, value\}\) (but since this is C, we don’t actually have a pair, we just occupy 2 indexes in the array). Here is my “cache” code:

struct cache {
    uint16_t *best;
    uint16_t count;
};

void addToCache(struct cache *cache, int index, uint16_t value) {
    cache->count++;
    void *tmp = realloc(cache->best, cache->count * 2 * sizeof(uint16_t));
    if( tmp == NULL )
        error(1, errno, "realloc");
    cache->best = tmp;
    cache->best[cache->count*2 - 2] = index;
    cache->best[cache->count*2 - 1] = value;
}

uint16_t getFromCache(struct cache *cache, int index) {
    for(uint16_t i = 0; i < cache->count; i++) {
        if(cache->best[i*2] == index)
            return cache->best[i*2+1];
    }
    return 0;
}

void initCache(struct cache *cache) {
    cache->count = 0;
    cache->best = NULL;
}

With this caching done, we finally have a program that runs and manages to squeeze itself into my RAM (actually this version takes up only slightly above 600 MB, so that’s nice). But does it run faster than the first attempt?

You bet! Now that we don’t overwrite saved values, the program is blazingly fast. It only takes 0.15 seconds, I’d say that compared to previous version’s 20 hours, that’s a pretty significant improvement!

Final thoughts

I hope I’ve shown why Dynamic Programming is an amazing technique for speeding up certain recursive problems. However as you can see, even DP won’t save you if you wrongly define what is the “same state”.

Code

If you’re interested in my implementation of this algorithm, you can find it here: https://github.com/zv0n/advent_of_code_2019/blob/master/18/part2/main.c