“Unable to evaluate the loss function. Check the loss function and ensure it

Technical Source
4 min readMay 25, 2021

--

“Unable to evaluate the loss function. Check the loss function and ensure it runs successfully”: `gradient` can’t access the custom loss function

I am trying to build a custom reinforcement learning environment with multiple agents having their own policy network for a project, and I have stuck in the training part (trying to follow a similar approach with this example)

My policy network accepts an array of size 21 as input and outputs a single element from [-1, 0, 1].

I have the following code (multiple-file code shortened into a single file; sorry for the mess):

clear
close all

%% Model parameters
T_init = 0;
T_final = 100;
dt = 1;

rng("shuffle")

baseEnv = baseEnvironment();
p1_pos = randi(baseEnv.L,1);
p2_pos = randi(baseEnv.L,1);
while p1_pos == p2_pos
p2_pos = randi(baseEnv.L,1);
end

rng("shuffle")

baseEnv = baseEnvironment();
% validateEnvironment(baseEnv)
p1_pos = randi(baseEnv.L,1);
p2_pos = randi(baseEnv.L,1);
while p1_pos == p2_pos
p2_pos = randi(baseEnv.L,1);
end

agent1 = IMAgent(baseEnv, p1_pos, 1, 'o');
agent2 = IMAgent(baseEnv, p2_pos, 2, 'x');
listOfAgents = [agent1; agent2];
multiAgentEnv = multiAgentEnvironment(listOfAgents);

%
actInfo = getActionInfo(baseEnv);
obsInfo = getObservationInfo(baseEnv);

%%build the agent1
actorNetwork = [imageInputLayer([obsInfo.Dimension(1) 1 1],'Normalization','none','Name','state')
fullyConnectedLayer(24,'Name','fc1')
reluLayer('Name','relu1')
fullyConnectedLayer(24,'Name','fc2')
reluLayer('Name','relu2')
fullyConnectedLayer(numel(actInfo.Elements),'Name','output')
softmaxLayer('Name','actionProb')];
actorOpts = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
actor = rlStochasticActorRepresentation(actorNetwork,...
obsInfo,actInfo,'Observation','state',actorOpts);
actor = setLoss(actor, @actorLossFunction);
%obj.brain = rlPGAgent(actor,baseline,agentOpts);
agentOpts = rlPGAgentOptions('UseBaseline',false, 'DiscountFactor', 0.99);
agent1.brain = rlPGAgent(actor,agentOpts);
%%build the agent2
actorNetwork = [imageInputLayer([obsInfo.Dimension(1) 1 1],'Normalization','none','Name','state')
fullyConnectedLayer(24,'Name','fc1')
reluLayer('Name','relu1')
fullyConnectedLayer(24,'Name','fc2')
reluLayer('Name','relu2')
fullyConnectedLayer(numel(actInfo.Elements),'Name','output')
softmaxLayer('Name','actionProb')];
actorOpts = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
actor = rlStochasticActorRepresentation(actorNetwork,...
obsInfo,actInfo,'Observation','state',actorOpts);
actor = setLoss(actor, @actorLossFunction);
%obj.brain = rlPGAgent(actor,baseline,agentOpts);
agentOpts = rlPGAgentOptions('UseBaseline',false, 'DiscountFactor', 0.99);
agent2.brain = rlPGAgent(actor,agentOpts);
%%

averageGrad = [];
averageSqGrad = [];
learnRate = 0.05;
gradDecay = 0.75;
sqGradDecay = 0.95;
numOfEpochs = 1;

numEpisodes = 5000;
maxStepsPerEpisode = 250;
discountFactor = 0.995;
aveWindowSize = 100;
trainingTerminationValue = 220;



loss_history = [];
for i = 1:numOfEpochs
action_hist = [];
reward_hist = [];
observation_hist = [multiAgentEnv.baseEnv.state];
for t = T_init:1:T_final
actionList = multiAgentEnv.act();
[observation, reward, multiAgentEnv.isDone, ~] = multiAgentEnv.step(actionList);

if t == T_final
multiAgentEnv.isDone = true;
end

action_hist = cat(3, action_hist, actionList);
reward_hist = cat(3, reward_hist, reward);
if multiAgentEnv.isDone == true
break
else
observation_hist = cat(3, observation_hist, observation);
end
end
if size(observation_hist,3) ~= size(action_hist,3)
print("gi")
end
clear observation reward
actor = getActor(agent1.brain);
batchSize = min(t,maxStepsPerEpisode);

observations = observation_hist;
actions = action_hist(1,:,:);
rewards = reward_hist(1,:,:);

observationBatch = permute(observations(:,:,1:batchSize), [2,1,3]);
actionBatch = actions(:,:,1:batchSize);
rewardBatch = rewards(:,1:batchSize);


discountedReturn = zeros(1,int32(batchSize));
for t = 1:batchSize
G = 0;
for k = t:batchSize
G = G + discountFactor ^ (k-t) * rewardBatch(k);
end
discountedReturn(t) = G;
end

lossData.batchSize = batchSize;
lossData.actInfo = actInfo;
lossData.actionBatch = actionBatch;
lossData.discountedReturn = discountedReturn;

% 6. Compute the gradient of the loss with respect to the policy
% parameters.
actorGradient = gradient(actor,'loss-parameters', {observationBatch},lossData);


p1_pos = randi(baseEnv.L,1);
p2_pos = randi(baseEnv.L,1);
while p1_pos == p2_pos
p2_pos = randi(baseEnv.L,1);
end
multiAgentEnv.reset([p1_pos; p2_pos]);
end


function loss = actorLossFunction(policy, lossData)

% Create the action indication matrix.
batchSize = lossData.batchSize;
Z = repmat(lossData.actInfo.Elements',1,batchSize);
actionIndicationMatrix = lossData.actionBatch(:,:) == Z;

% Resize the discounted return to the size of policy.
G = actionIndicationMatrix .* lossData.discountedReturn;
G = reshape(G,size(policy));

% Round any policy values less than eps to eps.
policy(policy < eps) = eps;

% Compute the loss.
loss = -sum(G .* log(policy),'all');
end

When I run the code, I am getting the following error:

Error using rl.representation.rlAbstractRepresentation/gradient (line 181)
Unable to compute gradient from representation.
Error in main1 (line 154)
actorGradient = gradient(actor,'loss-parameters', {observationBatch},lossData);
Caused by:
Unable to evaluate the loss function. Check the loss function and ensure it runs successfully.
Reference to non-existent field 'Advantage'.

I also tried running the example in the link; it works, but not my code. I put a breakpoint the loss function, but it isn’t called during the gradient calculation, and from the error message, I suspect this is the problem, but the thing is it works when I run the code of the example in mathworks’ website.

ANSWER

Matlabsolutions.com provide latest MatLab Homework Help,MatLab Assignment Help for students, engineers and researchers in Multiple Branches like ECE, EEE, CSE, Mechanical, Civil with 100% output.Matlab Code for B.E, B.Tech,M.E,M.Tech, Ph.D. Scholars with 100% privacy guaranteed. Get MATLAB projects with source code for your learning and research.

In the training loop, you collect the actor from agent.brain, which is an rlPGAgent. The actor, thus, used the loss function defined inside rlPGAgent and not your loss function, actorLossFunction. I believe you can bypass rlPGAgent creation and use actor representation throughout your custom training loop.

To be precise, the actor used inside agent1.brain overides your loss function with a different one.

SEE COMPLETE ANSWER CLICK THE LINK

--

--

Technical Source
Technical Source

Written by Technical Source

Simple! That is me, a simple person. I am passionate about knowledge and reading. That’s why I have decided to write and share a bit of my life and thoughts to.

No responses yet