There is a popular board game where you have sliding tiles, randomly placed on a square board, that together form a maze. There are treasures you have to find, and you can transform the maze by pushing one extra tile in, moving an entire row or column so that a new extra tile is pushed out on the opposing side. The challenge being of course that the other players (up to three) will frequently change the maze in ways that are odds with your purpose. Quite mind-bending at times. The German title Das verrückte Labyrinth is a pun that plays on the double entendre of verrückt which means both moved, shifted, displaced, and crazy. The English version is simply called Labyrinth.
For weeks I had been frequently thinking about a project for our adaptive systems course that was neither insane nor trivial in scope, and preferably not so well-trodden that you practically couldn’t avoid just copying a solution from the internet. The professor had recommended Connect 4 as an always popular choice, but that was in fact both trivial and evidently akin to beating a dead horse. After considering a lot of options and always finding that they had been done many times before, a couple of weeks ago I suddenly had the unlikely inspiration that I would do the verrücktes Labyrinth. You see, everybody knows it; with just 33 moving tiles of just three different kinds–10 if you count that you can rotate them–it’s not completely insane in terms of state space (possible combinations), but neither is it trivial, what with the dynamic environment and the two-tiered player turn where you slide first, then move your piece; and apparently it’s never been done before.
And of course it was clear to me that the exciting thing would be watching the program learn to play a game without being told the rules, let alone having its moves judged or being trained on approved solutions–the classic unsupervised learning scenario. I thought (and still think) that the Maze is just complex enough to make that worthwhile but simple enough for it to work within a reasonable timeframe. The fascination of unsupervised learning is that simply by playing and observing rewards a game engine will, in time, find out what a game is about–and then possibly come up with actions that a human player would not consider in a given situation. Of course in a game with a huge state space, i.e. a large number of possible situations, such as chess (1050 states), or, god forbid, go (1070 states), this can take basically forever, but with the Maze it might just work.
For several weeks I hungrily devoured everything I could find about unsupervised learning and games to get inspiration. It quickly became obvious that of the three subjects covered in the lecture I would pick neural networks as the most exciting technique and the one best suited to unsupervised learning.
A neural network, at core, is several layers of interconnected nodes, each of which crudely models the working of a brain cell, i.e. it receives several input signals and produces an output signal if a certain threshold is reached. That way, a neural network transforms input to the network (e.g., information about a game state) into output (e.g. a game move). Now the clue is that each connection link has a “weight” that describes how important the signal passed on it is compared to other signals. These weights are initially random, and then the network learns by receiving an error signal and backpropagating it through the network, adjusting the link weights. Do that often enough, and the network will have modified the weights enough to link each input combination (game state) with the most promising output combination (game move).
It’s a fascinating technique because of its almost biological model. You know how it does it–in fact it’s a huge area of study, with textbooks several hundred pages long that do nothing but discuss the best architecture, algorithms, signal activation functions, error (loss) functions–and you could basically look at the weights to find out what it does, but in the end the huge number of connections and weights mean it remains an incomprehensible black box, and it’s not even important to know–it just works. And you can save the most complex relationships of an infinite number of states (input) and actions (output) in just the weight matrix of the network. There is no other information stored anywhere!
Neural networks have been used in recent years–it’s a regular hype–for a huge number of machine-learning tasks, including scientific classification taks, image and speech recognition, clustering of huge amounts of data, and so on. The classic basic example is always identifying handwritten numbers, using the famous MNIST database, and it’s fascinating to watch how a neural network learns doing that with nothing but the raw input and an error signal if it guesses wrong–do it long enough, and the success rate is well over 99 per cent, which beats most humans. But I was sold on using a neural network when I saw a video where someone had trained one to play Mario. The network knew nothing about the game–all it had to go by was being able to use the keyboard keys and getting a reward each turn it didn’t die. The first few hours nothing happened. Then it learned to move to the right and die on the first obstacle. Then it found out that hitting a particular key (the jump key) meant it would not die in the same situation. After 24 hours it played the level to perfection. Mind you, just this particular level–it still didn’t have any clue what the game was about!
But that’s the whole point of a neural network–it doesn’t need to be told what it actually does, it just finds out which actions are successful in a given situation. In a way it’s completely dumb, yet neural networks have been successfully used to play, famously, backgammon (TD Gammon), video games, and recently even chess (Giraffe), using nothing but the raw board information as input, though apparently they still rather suck at go, the ultimate challenge for game-playing machines.
And that’s the kind of information I have been studying for several weeks now, together with books and websites on machine learning and on reinforcement learning (learning by observing rewards) in general, on neural networks, and on the programming language Python, the default choice for machine learning applications, and the numerous scientific, machine learning, and neural network libraries that are already available to do most of the things out of the box that one might, though, do oneself with surprising ease: Make Your Own Neural Network, by Tariq Rashid (2016), does a great job of showing the reader how to do, with a few lines of Python code, a simple network that identifies the handwritten numbers mentioned above with near-human accuracy.
So far I have a rather good idea of how to model the game itself, of the kind of input and output I am going to use, of the network architecture, and of the learning model: as I said, unsupervised learning with no error signal but the reward for reaching the goal, i.e. finding the treasure (in the Maze you draw cards to know which particular treasure you have to find next, so it comes down to reaching a certain–sliding–tile to collect the reward). I am still uncertain how to induce the network to learn from the combined result of a turn consisting of two parts: sliding, then moving; particularly since you can determine the possible legal moves only after sliding. I might need two networks that receive different input and produce different output, but train on the same reward. And I am still somewhat fuzzy on how the reward should be translated into an error signal, because I am still not quite clear about reinforcement learning, specifically temporal difference learning, as such. I am still reading the classic reference work, Reinforcement Learning: An Introduction, by Richard S. Sutton and Andrew G. Barto (1998), and I kind of surrendered before all the math in the equally classic, but infinitely less readable and more weighty Deep Learning, by Ian Goodfellow, Joshua Bengio, and Aaron Courville (2016). But in the end it will undoubtedly come down to a lot of trial and error anyway. As the classic book Neural Networks and Deep Learning by Michael Nielsen emphasizes, neural networks have so many moving pieces that there is no rule of thumb beyond making your best guess and than finetuning with infinite patience.
A good thing, then, that our professor has made it quite clear that he does not expect us to complete our project before the end of the term. We are just to work on something interesting and instructive, and then present our approach, our work in progress, and our findings in the oral presentation en lieu of an exam. Particularly with respect to my Maze project he said this was rather the size of a bachelor thesis, but never mind–I could still turn it into one, and/or continue working on it within the context of his fifth-term project course on adaptive agents. So I suppose one of these days I should probably emerge from my own maze of books, websites, and ideas and start programming something–even just the game itself, which afterall can be done independently of the machine learning algorithm and neural network. I am kind of looking forward to that, yet at the same time I fear once I start working on it I will stop thinking and lose sight of the forest for all the trees. As Joshua Bloch, the Java guru, famously said, “a week of coding will often save an hour of thought”, and like most of my peers I am particularly prone to falling into that trap–starting to program as a substitute for thinking something through. I want to avoid that if at all possible, this time. Yet on the other hand I need something to show just five weeks from now. So I suppose whether I want or not, it’s time to get started!