The AI in this example project was trained to keep a food bowl away from cats who are chasing it. Each cat can either walk towards the bowl, jump towards the bowl, or sit and clean itself for a few seconds.

Network Strategy

Input

The network has two sets of racast inputs: one for the distance to the bounds of the scene, and one for the distance to the nearest cat. 32 rays are cast in all directions around the food bowl, making for a total of 64 inputs into the network.

Structure

See SmartEngine/Examples/KittyKeepAway/Resources/FoodGraphDefinition.json

The network consists of a LSTM layer followed by a series of linear neuron layers. Having an LSTM as the first layer lets us track the input over time, modeling the relative velocities of the cats and the walls.

Training Methodology

This example uses the D4PG reinforcement learning (RL) trainer to train the AI. RL trainers use agents to record the input / output of the network, along with an associated reward. The trainer then tries to maximize rewards over time. Unlike genetic training, RL works best when you replay the same scenario over and over again. Each time through, the network will get better towards reaching the end goal and pick up more rewards along the way.

D4PG was chosen because the output of the network is continuous (full range of values) instead of discrete (choose one of X values). The agent is given a small reward for each frame that it has successfully evaded being caught by a cat. If the bowl is caught, the episode ends and the game is reset (this behavior can be toggled by changing a boolean on the trainer game object).

Code Structure

The graph used to control the food bowl is controlled by SmartEngine.Examples.KittyKeepAway.AINetwork. This class uses the network system to automatically produce results. The SmartEngine.Examples.KittyKeepAway.AIController class is the corresponding controller. See the Balance Ball Example Docs for a brief description of the network cycle.

SmartEngine.Examples.KittyKeepAway.AITrainer trains the food network. It leverages the RL Training Harness helper class to do the heavy lifting of setting up a networked reinforcement learning trainer. Multiple clients can connect to the trainer, though the game is set up to only evaluate 3 games at a time by default. To do so, simply start training on the server and open one or more other Unity instances. In each, click connect. They will automatically start training. The trainer internally sets up a list of tasks that are handed out to connected clients one by one.