The concept of a jumping puzzle has been around for decades. Think of a regular maze, where you have a
start point and must reach the end point through a series of up, down, left, right actions. Now add holes
in this maze along with a jumping action to the action space and you have a jumping puzzle.
The ability for a character to jump is a key mechanism in many successful games. Jumping puzzles takes
this mechanism to a whole new level in which the player’s knowledge of where to jump and when to jump
can decide whether they pass the level or not. Games like Guild Wars and Assassin’s creed implement this
feature to some extent while games like Super Mario are a giant jumping puzzle in itself. A jumping puzzle
AI would allow game developers to test whether a jumping puzzle is able to be completed/whether it is too
easy or hard. The field of robotics is fairly new but that doesn’t stop researchers from having visions
of robots saving people from catastrophes or fighting in wars in place of humans. Like Minecraft, the real
world is not just flat lands. If a robot is going to perform those functions, it will need to be able to,
not only walk, but to learn to climb and maybe even jump!
In our project, we give our agent, JPS, progressingly intricate jumping puzzles to solve. What separates a
jumping puzzle from a regular maze is that it is only solveable by jumping. That is, JPS may choose to walk
through the maze (if it leads to the optimal reward) but it can never reach the destination block if it does
not utilize jumping.
Sounds simple enough right? Wrong. There are many complex attributes to a jumping puzzle. Jumping across a
single-block hole is easy but what happens when the hole is 2-blocks large? Increasing the size of the maze is trivial
(we deal with x and z coordinates) but how does the complexity scale when we add multiple layers to the maze (now
we’re dealing with x, z, AND y). How would the agent even know to jump to the floor above it? Stay tuned and
JPS will help unearth the answers to some of these questions!
Once we got our project working, we tested it on a simple 5x4x1 dimension maze. The dimensions of this maze
means that there is a 5 by 4 square of height one. This map and our agent’s performance on this map will determine
our baseline performance. The next step to increase the complexity was to create more complex maps. From here, we
generated maps with dimension 5x4x2 with the goal of the maze in the level above the start. This would require our
agent to jump vertically, in addition to jumping horizontally.
To do this, we needed to introduce a Y-axis to our agent. We modified the agent’s action state so that our agent will
jump in the vertical Y-axis. The tabular Q agent now also accepts a vertical component into its world state and can
progress vertically through a puzzle. To implement this into discreteMovementCommands() to get our current position:
obs = json.loads(world_state.observations[-1].text)
We then look at our current state and choose the optimal policy:
total_reward += self.act(world_state, agent_host, current_r)
The new agent and the new maps with more than one level define our proposed approach. Comparing
the proposed approach with our baseline, the baseline is very simple. The map is two-dimensional
and only requires n by m unique q-table states. Adding more than one level increases the complexity
of the agent’s state space exponentially from O(mn) to O(mnt), there m,n, and t are the dimensions
of the puzzle. The advantage of our approach is that comparing convergence times of maps of different
levels and comparing this to the respective state space of each map allows us to evaluate the performance
of our agent on larger maps and with more levels.
Once our agent is able to traverse puzzles of height 2, height becomes trivial and in theory, the agent should be able
to solve puzzles of any height. From here, it is a matter of creating more mazes for the agent to run through and seeing
how it performs.
Previously, our agent has proved that it can solve a jumping puzzle that has 1 level. But we wanted
our agent to do better than that. So, we made a few improvements to the agent, specifically changing
the Q-table behavior and the agent’s action space. We also change the reward penalty to +1000 and -1000
respectively to account for bigger map size. We then tested the agent on 2 complex maps to see if the
agent can still reach its destination. These maps are 2 layers (6x10x2) and 3 layers (8x8x3). For both
maps, we keep track of the number of steps that the agent takes and the reward for each episodes.
For the 2-layer-map, as you can see above, convergence occurs after about 70 iterations. The reward
reaches approximately ~990 as the agent continues to improve and take less steps to reach the goal.
The graph flattens out at 9 steps.
At the 3-layer-map, we can expect the agent to take longer to reach its destination. This is because the
3-layer-map is more complex and bigger than the other 2 maps. We notice that the agent takes twice as
long (about 140 iterations) to start converging. The agent also takes a lot longer to reach its
optimal point, where the number of step taken flattens out approximately after 160 iterations.
Despite that, we are happy that the agent was able to perform and reach its destination regardless
of the map complexity.