官术网_书友最值得收藏!

Setting up the Agent

Agents represents the actors that we are training to learn to perform some task or set of task-based commands on some reward. We will cover more about actors, actions, state, and rewards when we talk more about Reinforcement Learning in Chapter 2, The Bandit and Reinforcement Learning. For now, all we need to do is set the Brain the agent will be using. Open up the editor and follow these steps:

  1. Locate the Agent object in the Hierarchy window and select it.
  1. Click the Target icon beside the Brain property on the Simple Agent component and select the Brain object in the scene, as shown in the following screenshot:
Setting the Agent Brain
  1. Click the Target icon on the Simple Agent component and from the context menu select Edit Script. The agent script is what we use to observe the environment and collect observations. In our current example, we always assume that there is no previous observation.
  2. Enter the highlighted code in the CollectObservations method as follows:
      public override void CollectObservations()
{
AddVectorObs(0);
}
  1. CollectObservations is the method called to set what the Agent observes about the environment. This method will be called on every agent step or action. We use AddVectorObs to add a single float value of 0 to the agent's observation collection. At this point, we are not currently using any observations and will assume our bandit provides no visual clues as to what arm to pull. 
     The agent will also need to evaluate the rewards and when they are collected. We will need to add four slots, one for each arm to our agent, in order to represent the reward when that arm is pulled.
  2. Enter the following code in the SimpleAgent class:
      public Bandit bandit;
public override void AgentAction(float[] vectorAction,
string textAction)
{
var action = (int)vectorAction[0];
AddReward(bandit.PullArm(action));
Done();
}

public override void AgentReset()
{
bandit.Reset();
}
  1. The code in our AgentStep method just takes the current action and applies that to the Bandit with the PullArm method, passing in the arm to pull. The reward returned from the bandit is added using AddReward. After that, we implement some code in the AgentReset method. This code just resets the Bandit back to its starting state. AgentReset is called when the agent is done, complete, or runs out of steps. Notice how we call the method Done after each step; this is because our bandit is only a single state or action.
  2. Add the following code just below the last section:
      public Academy academy;
public float timeBetweenDecisionsAtInference;
private float timeSinceDecision;

public void FixedUpdate()
{
WaitTimeInference();
}

private void WaitTimeInference()
{
if (!academy.GetIsInference())
{
RequestDecision();
}
else
{
if (timeSinceDecision >= timeBetweenDecisionsAtInference)
{
timeSinceDecision = 0f;
RequestDecision();
}
else
{
timeSinceDecision += Time.fixedDeltaTime;
}
}
}
  1. We need to add the preceding code in order for our brain to wait long enough for it to accept Player decisions. Our first example that we will build will use player input. Don't worry too much about this code, as we only need it to allow for player input. When we develop our Agent Brains, we won't need to put a delay in.
  2. Save the script when you are done editing.
  3. Return to the editor and set the properties on the Simple Agent, as shown in the following screenshot:
Setting the Simple Agent properties

We are almost done. The agent is now able to interpret our actions and execute them on the Bandit. Actions are sent to the agent from the Brain. The Brain is responsible for making decisions and we will cover its setup in the next section.

主站蜘蛛池模板: 准格尔旗| 白城市| 铜陵市| 阜宁县| 遂宁市| 浮山县| 明溪县| 巨野县| 宁乡县| 宜州市| 盐池县| 东光县| 铜川市| 清苑县| 太仆寺旗| 兰西县| 娱乐| 朝阳区| 博客| 安顺市| 贵阳市| 逊克县| 定西市| 邓州市| 大冶市| 五大连池市| 通州市| 凤翔县| 平湖市| 葵青区| 莆田市| 新乐市| 济阳县| 嘉黎县| 洛阳市| 浪卡子县| 丰都县| 泗洪县| 和平县| 中西区| 法库县|