This example game has you face off with the AI in a simple shooter. Ammo and health periodically spawn on the map.

Network Strategy

There are three main aspects to the AI: movement, aiming, and actions. We need a way to navigate the map to pick up items and engage the player. We must also aim at the player, ideally in a way that leads the target so our shots hit better. And finally we must choose to shoot, reload, or do nothing. It's important to conserve ammo and only reload at opportune times.

All three aspects can be merged into a single graph with a combined output, split common input into separate sub-graphs, or split into one or more completely isolated networks. The last approach has the benefit that we can train parts independently from others.

Let's focus on movement first. There are many approaches, but two were investigated during the development of the example. The first is to use raycasts to learn about our space and navigate through it. Raycasts can hit either the player, an item, or a wall. A fixed number of rays (80) are shot out in evenly spaced intervals around the AI. Since the player has a birds eye view of the playfield, we want our AI to have that same view to make the game fair. Therefore, the item and player raycasts go through walls. This introduces some issues though because we have a priority issue. The AI might want to grab an item that is directly north, but there is a wall blocking the way, so it must actually go east. Training the AI to overcome this can be done with a sufficiently deep network at the cost of either more training data or genetic training time. However, there is another approach that is much simpler.

The second approach is the same one we used in the Gold Rush example. Instead of feeding one row of the entire game state, we feed it multiple rows, each representing a single node that we can walk to. Each row contains all the information needed to make a decision and we choose the row that has the maximum output. Then, Unity's navigation system is used to actually move the AI to the location. This approach was ultimately used, but has a few downsides. The first is that gradient descent training is harder because determining what node the recorded player intended to go to isn't as clear as in the Gold Rush example. To get around this, genetic training is used, so the AI decides its own concept of what is best. The second is that the movement feels less organic and more "scripted" because actual movement is composed of straight lines. For this example, that was decided to be okay.

Determining action is straight forward. We simply need to choose one of three actions - fire, reload, or do nothing. Training could be done with gradient descent as it's easy to tell what action the recorded player took, but in this example, genetic training was used since that's what we are using for movement and the two are related. The action nodes were put in the same graph as the movement for ease of training. We take the max value of the three outputs because it leads to consistent results when training genetically.

Lastly is aiming. Aiming again is straight forward. Bullets don't change position based on the character's movement, so we just care about the opponent's relative location and velocity. An LSTM is useful here because there's an implicit time component involved. Our aim direction changes based on how the player is moving over time. Therefore, a single LSTM layer is used in the entry point of the graph. This graph is separated from the movement & action graph because we can train separately without side-effects.

Input

The aim graph has two components: the player's relative location and the player's relative velocity.

The movement sub-graph takes into account whether the location is an ammo or health or none, the distance from the node to the AI and player, the AI's distance to the character, an indication of whether we can see the player from that node, and the AI's health. Additionally the player's health could be useful for determining if we should dive on the player if they are at low health. Feel free to experiment on your own. The interesting thing to note is the actual value of the ammo and health inputs. We want the AI to not sit on health when it's at full and similarly, not only seek out ammo, especially if it has full ammo. Using a 1/0 for the node and separate inputs for our ammo and health will lead to behavior we don't want because there is no indication what "full" means. Another input could be added, but this only complicates things. Instead, a much better solution is to build that information into the item input itself. We use a value of (1 - AI health) for health items. If we are at full health (health == 1), then the node effectively becomes invisible and scales to more importance as we lose health. A similar approach works for ammo too. This is a fantastic solution because it means less inputs and play's nicely with neural network capability to work off fuzzy data.

The action sub-graph simply takes in the ammo in the gun and whether or not the AI can see the player.

Structure

See SmartEngine/Examples/Shooter/Resources/ShooterMovementActionGraphDefinition.json and SmartEngine/Examples/Shooter/Resources/ShooterAimGraphDefinition.json

The movement and action graph is composed entirely of linear layers and has a similar structure to the previous examples.

As mentioned above, the aim graph uses an LSTM layer because of the implicit time aspect to the input. The LSTM layer does not need an activation function as it is built into the layer itself (tanh is used).

Training Methodology

Because we have two separate graphs, we can train them independently. The aim graph is first trained with recorded data of the user firing a moving dummy using gradient descent. This gives us a good baseline, but we're only as good as our training data and the user doesn't have perfect aim. Therefore, a genetic trainer is used to refine the graph to produce better results.

As mentioned above, the movement / action graph is limited to genetic training due to difficulty in extracting meaningful movement data from recorded user games. Instead of training competitively, we train against a dummy bot that produces the same set of movement and actions for all chromosomes. This leads to less number of games we have to play because it's no longer a multiple of N^2. The scoring formula is (kills^2 / deaths). This suggests we greatly care more about the number of kills and not as much deaths, leading to a more aggressive AI. It's very important to note how simple this scoring function is. Notice that we don't tell it anything about when to collect ammo / health or when to shoot / reload. The AI figures out on its own that it must collect ammo and shoot only when opportune because if it doesn't, it won't be able to get as many kills. The same goes for collecting health. In essence, we have created a smart AI without ever having to tell it how to be smart!

Code Structure

See the Balance Ball example for a more detailed description of the network and genetic trainer helper classes

SmartEngine.Examples.Shooter.AINetwork contains the networks and SmartEngine.Examples.Shooter.AIController the controller that the AI uses, leveraging the network helper infrastructure. SmartEngine.Examples.Shooter.DummyNetwork. contains the networks that the dummy bot uses during training. It can be initialized in different modes depending on what kind of training we are doing. For instance, it can be set up to walk around with infinite health and not shoot when training the aim network, or to follow the AI closely during movement / action training. The network classes contain two helper network interfaces: one for the aim graph and one for the movement / action graph. The controller class also implements two controller helper interfaces. The aim part of the controller can be mapped separately from the movement / action, allowing for graph evaluation for aim and stand still movement from the dummy network during aim training.

SmartEngine.Examples.Shooter.AimRecordingTrainer trains from a recording using gradient descent. The actual aim recording is done by the SmartEngine.Examples.Shooter.GameRecorder class.

SmartEngine.Examples.Shooter.AimGeneticTrainer further refines the aim network using genetic training. It is expected that training was done first from recording as the trainer does not initialize with random networks. The training AI controller uses a movement / action dummy network to stand still and fire every frame, but the normal graph evaluation for aim.

SmartEngine.Examples.Shooter.MovementActionGeneticTrainer is the genetic trainer for movement / action training. It leverages the networked genetic trainer helper class. Depending on the game seed being trained on, a different version of the dummy network is used. This forces the AI to be good across a variety of scenarios instead of just one. It is expected that the network has fully completed aim training before starting movement / action because this trainer uses the aim graph but does not modify it. Like the Gold Rush Example, this example is set up to allow for the game to be stepped multiple times in a Unity frame, allowing for faster than real-time training.