The AI in this example project was trained to balance a ball on a platform by bringing it to the center of the platform as quickly as possible by controlling the torque applied to the platform.

Network Strategy

Input

The network has an input dimension of 12: the ball position (3), the ball velocity (3), the 'up' direction of the platform (3), and the angular velocity of the platform (3).

Structure

See SmartEngine/Examples/BalanceBall/Resources/PlatformGraphDefinition.json

The network consists of two basic hidden linear layers of 32 and 24 neurons with a selu activation and a linear output layer of 2 neurons with an activation of tanh. The 2 output neurons control the X and Z torque applied to the platform.

Selu is a good choice for the activation layer of hidden layers in general, but especially good when using gradient descent training if the input follows a unit normal distribution. Tanh is a good fit for the output layer as we need balanced positive and negative values when applying torque. Since tanh has a range [-1, 1], a multiplier is applied after network evaluation so the final torque output can stretch beyond a value of 1.

Training Methodology

When considering how to train this network, it is not immediately clear what the output of the network should be. After all, if we knew the algorithm for the output, we wouldn't need the network at all. Therefore, a Genetic Trainer is used. We simulate dropping the ball on different positions and record the average loss of the trials. Trials run until either the ball falls off the platform or until a fixed time is reached.

The loss is manually set to be the sum of the square root of the ball distances from the center of the platform. Each frame of training, the square root of the ball's distance from the center is recorded and summed. Square root is used instead of plain distance because the graph curves sharply towards zero as the distance decreases. In essence, we really like the ball to be close to zero. At the start of training, the platform will swing wildly and the ball will not remain on the platform. A time component is introduced so that we can increase the loss dramatically if the ball falls off the platform early. This is simply done by dividing the ball position loss by the simulation time.

Code Structure

The graph used to control the platform is controlled by SmartEngine.Examples.BalanceBall.PlatformNetwork. This class uses the network system to automatically produce results. The SmartEngine.Examples.BalanceBall.PlatformMotor class is the corresponding controller. When the Network Manager on SmartEngine.Examples.BalanceBall.Scene is stepped, the controller is asked for data. The manager hands this data to the network (along with data from any other controllers mapped to the network), tells the network to evaluate, and retrieves the results. The results are then fed back into the controller, which applies a torque to the platform.

SmartEngine.Examples.BalanceBall.PlatformGeneticTrainer trains the platform network. It leverages the Genetic Training Harness helper class to do the heavy lifting of setting up a networked genetic trainer. Multiple clients can connect to the trainer. To do so, simply start training on the server and open one or more other Unity instances. In each, click connect. They will automatically start training. The trainer internally sets up a list of tasks that are handed out to connected clients one by one. In this example, each task is for one round of simulation for one ball position for an individual chromosome. Therefore there are (N ball position * M chromosomes) number of tasks. When training starts on the server, a client on that instance is automatically connected so that training can happen on a single machine.