Part 2: Unity ML-Agents on Arm and how we created game AI

July 29, 2022

7 minute read time.

In part 1 of this blog series, we provided a general overview of our Dr Arm’s Boss Battle Demo. Part 2 takes a more in-depth look at how game AI agents are designed and what generated Neural Network (NN) models look like.

Agents Design

Once the strategy for the boss battle have been decided, the next step is to design the agent. Designing an agent mainly requires four items to be clarified. More information on agents design can be found here.

1. Input to the Agent: What information does the agent need?

We first must think about what information the agent needs for the target task. In our demo, the input includes the statistics, the action events and the position of the target and the agent itself. The Statistics are Health, Mana, and Stamina. The action events are Attack, Roll and Fire. We collect such information in two ways.

One way is to give the information to the agent from the agent C# code. Mainly, the stats and the action events are passed in this way, as the following:

// Customized class to manage a character's state 
public PlayerManager _manager; 
public PlayerManager _enemyManager;
 
public override void CollectObservations(VectorSensor sensor)
{
   // Collect my state and add them to observations
   // Normalize a value to [0, 1] by dividing its max value
   sensor.AddObservation(_manager.Stats.CurrentHealth / _manager.Stats.MaxHealth); 
   sensor.AddObservation(_manager.Stats.CurrentStamina / _manager.Stats.MaxStanima);
   sensor.AddObservation(_manager.Stats.CurrentMana / _manager.Stats.MaxMana);
   sensor.AddObservation(_manager.RollFlag);
   sensor.AddObservation(_manager.IsInteracting);
   sensor.AddObservation(_manager.throwFire);
   sensor.AddObservation(_manager.posFire); // Vector3 type
 
   // Collect enemy's state and add them to observations
   sensor.AddObservation(_enemyManager.Stats.CurrentHealth / _enemyManager.Stats.MaxHealth);
   sensor.AddObservation(_enemyManager.Stats.CurrentStamina / _enemyManager.Stats.MaxStanima);
   sensor.AddObservation(_enemyManager.Stats.CurrentMana / _enemyManager.Stats.MaxMana);
   sensor.AddObservation(_enemyManager.RollFlag);
   sensor.AddObservation(_enemyManager.IsInteracting);
   sensor.AddObservation(_enemyManager.throwFire);
   sensor.AddObservation(_enemyManager.posFire); // Vector3 type
 
   int isEnemyFacingMe = (Vector3.Dot(_manager.transform.localPosition - _enemyManager.transform.localPosition, _enemyManager.transform.forward)) > 0 ? 1 : 0;
   sensor.AddObservation(isEnemyFacingMe);
}

Health, Mana, and Stamina of the target and the agent itself are vital information. Health is a direct indicator of winning a battle, so must be provided. It is also likely to be important for success to take actions depending on Mana and Stamina. Action events allow the agent to know if the target is about to attack. We also provide whether the target is looking at the agent based on their orientation. This information allows the agent to get behind the target for a more effective attack.

The other way to give the information to the agent is to use Raycasts. You can think of them as lasers that detect line of sight to an object. This is used to detect walls and the position of the target by adding RayPerceptionSensor3D component to the agent.

Animated GIF showing Raycasts in action

Figure 1. Raycasts in action (Left: top-down view, right: oblique view)

The number of inputs data fed into the NN model is the sum of the inputs defined by the two ways above. The number of inputs in the C# code is 57. This can be calculated using the formula: (Space Size) * (Stacked Vectors). In this instance, Space Size is the number of observed data collected by the AddObservation method and Stacked Vectors is the number of frames of the inputs fed to an NN model at once. The Stacked Vectors can be set in Unity’s UI as shown in figure 2. You must match the parameters with the observation you have defined in the code. The number of inputs by the raycasts is 492. The number can be calculated using the formula: (Stacked Raycasts) * (1 + 2 * Rays Per Direction) * (Num of Detectable Tags + 2). These can be set in Unity’s UI too. Of course, the smaller the number of rays and tags, the smaller the amount of data used and the lighter the computational complexity.

Images showing Agent Behaviour Paramters (left) and Ray Perception 3D Components (right)

Figure 2. Agents Behavior Parameters (left) and Ray Perception Sensor 3D Component (right)

2. Output from the Agent: Which actions to take to accomplish the target task?

Next is to define the possible outputs from the agent. We map the output from the agent one-to-one with the unique actions of the character. In this game demo, the actions that Dr Arm and Knight can take are identical.

Animated GIF showing character actions

Figure 3. Character actions (Left: Dr Arm, right: Knight, bottom: possible actions)

The characters can move in two axes, horizontal and vertical, each taking a continuous value from -1 to 1. It also takes four exclusive discrete values as an action. Each value is assigned to an action: ATTACK that swings a sword, FIRE that throws a fireball, ROLL that dodges an attack, and NO ACTION. These can be implemented as shown in the sample code in the following:

// Called every time the agent receives an action to take from Agent.OnActionReceived()
public void ActAgent(ActionBuffers actionBuffers)
{
   // Joystick movement
   var actionZ = Mathf.Clamp(actionBuffers.ContinuousActions[0], -1f, 1f);
   var actionX = Mathf.Clamp(actionBuffers.ContinuousActions[1], -1f, 1f);
   Vector2 moveVector = new Vector2(actionZ, actionX);
   _inputController.Move(moveVector);
 
   // Discrete actions
   if (actionBuffers.DiscreteActions[0] == 1)
   {
       _inputController.Attack();
   }
   else if (actionBuffers.DiscreteActions[0] == 2)
   {
       _inputController.Roll();
   }
   else if (actionBuffers.DiscreteActions[0] == 3)
   {
       _inputController.Fire();
   }
}
 
// Heuristic convers the controller inputs into actions. 
// If the agent has a Model file, it will use the NN Model to take actions instead.
public override void Heuristic(in ActionBuffers actionsOut)
{
   var continuousActionsOut = actionsOut.ContinuousActions;
   var discreteActionsOut = actionsOut.DiscreteActions;
 
   continuousActionsOut[0] = Input.GetAxis("Horizontal");
   continuousActionsOut[1] = Input.GetAxis("Vertical");
 
   if (Input.GetKey(KeyCode.Joystick1Button0))
   {
       // attack
       discreteActionsOut[0] = 1;
   }
   else if (Input.GetKey(KeyCode.Joystick1Button2))
   {
       // roll
       discreteActionsOut[0] = 2;
   }
   else if (Input.GetKey(KeyCode.Joystick1Button1))
   {
       // fire
       discreteActionsOut[0] = 3;
   }
   else
   {
       // do nothing
       discreteActionsOut[0] = 0;
   }
 
}

As described in figure 3, there are 2 continuous actions (horizontal and vertical movement), and 1 discrete action. For the agent's output, there are 4 possible values. The sum of these values equals the number of nodes in the output layer of the NN model. Again, you must match the parameters in Unity’s UI shown below with the number of actions you have defined in the code.

Image Showing Behaviour Parameters

Figure 4. Behavior parameters should match with the output configuration

3. NN Model Structure: How should the brain process info?

Next, consider the brain of the agent for decision making. Which NN model structure should it be? Is historical information necessary? Is camera input required?

By default, ML-Agents uses a Multi-Layer Perceptron (MLP) structure. MLP is the most basic structure of a neural network with each neuron connected as shown in figure 5. The input layer and output layer of the network are determined by the inputs and outputs defined in the Design sections 1 and 2 above. In addition, ML-Agents provides several parameters to change the number and size of intermediate layers and more.

Image Showing Neural Network

Figure 5. MLP NN model structure

There are three parameters that game developers can change:

• Stacked Vectors: the number of frames of input data fed to the NN model at once
• Number of Layers: the number of intermediate layers in the NN model
• Hidden Units: the number of neurons per layer

Game developers must set the appropriate parameter values depending on the complexity of the task. In our demo, the NN model has 3 Stacked Vectors, 2 intermediate layers and 128 neurons per layer. The size of the NN model is not that large, but this is a somewhat common size in Reinforcement Learning. As mentioned in the Design section 1 above, the Stacked Vectors can be set in Unity’s UI. The other two parameters should be specified in a YAML script, which is passed to the training command. More information on network settings can be found here.

ML-Agents generates NN models as ONNX format. Below is the structure of the generated NN model. As you can see, the inputs and outputs defined are reflected in the model. In the upper right, an input named action_masks has been created. This can be used to disable specific actions at a point in time. For example, you can explicitly mask FIRE action when there is not enough Mana left, but we did not use this feature in our demo.
Flowchart showing the Generated NN Model's Flowchart

Figure 6. Generated NN model

4. Reward Function: How it will be trained?

Finally, consider what rewards should be given. Rewards play a key role in setting the agent's goals. In this demo, three main rewards are given to the agent, depending on the state.

Image showing Reward Function

Figure 7. Reward function

In our case, the first is a large positive reward given when the agent achieves its aim and defeats the target. A +1 reward is given in this case. Conversely, a large negative reward is given when the agent is defeated by the target. Being defeated is an event that must be avoided, so a -1 reward is given in this case. Finally, a small negative reward continues to be given at every step. This way the agent is incentivized to defeat the target as quickly as possible. Also, during training, a timeout is defined after a certain period. It is 2500 steps in our case. This accumulated penalty becomes -1 when the timeout occurs. This means that a time-out draw is the same negative reward as being defeated by the target.

In part 3, I will explore training strategy for the game AI agents.

Mobile, Graphics, and Gaming blog

Arm Performance Studio: A look back, and a look forward

Peter Harris

Arm Performance Studio release 2024.6 release bringing you quality-of-life improvements and bug fixes. Read this blog post for more information about other features in this release.
- December 20, 2024
The future of AI for games

Ian Bolton

Arm sponsored the AI and Games Conference at Goldsmiths in London, read about the day that brought experts and enthusiasts together for talks on the intersection of AI & gaming.
- November 29, 2024
Hidden Surface Removal in Immortalis-G925: The Fragment Prepass

Tord Øygard

Arm's Immortalis and Mali GPUs are energy efficient. In this blog post fragment pre-pass for Arm GPUs is discussed with Immortalis-G925, Mali-G725 & Mali-G625.
- November 28, 2024

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Part 2: Unity ML-Agents on Arm and how we created game AI

Agents Design

1. Input to the Agent: What information does the agent need?

2. Output from the Agent: Which actions to take to accomplish the target task?

3. NN Model Structure: How should the brain process info?

4. Reward Function: How it will be trained?

Arm Performance Studio: A look back, and a look forward

The future of AI for games

Hidden Surface Removal in Immortalis-G925: The Fragment Prepass