Importing pre-trained recurrent network to reinforcement learning agent
Are pre-trained recurrent networks re-initialized when used in agents for reinforment learning? If so, how can it be avoided?
I am importing a LSTM network trained using supervised training as the actor for a PPO agent. When simulating without training the reward is fine, however If the agent is trained the reward falls as if no pre-trained network was used. I would expect the reward to be similar or higher after training so presumably the network is being re-initialized, is there a way around it?
% Load actor
actorNetwork = net.Layers;
actorOpts = rlRepresentationOptions('LearnRate',learnRate);
actor = rlStochasticActorRepresentation(actorNetwork,obsInfo,actInfo,'Observation',{'input'},actorOpts);% Create critic
criticNetwork = [sequenceInputLayer(numObs,"Name","input")
criticOpts = rlRepresentationOptions('LearnRate',learnRate);
critic = rlValueRepresentation(criticNetwork,obsInfo,'Observation',{'input'},criticOpts);% Create agent
agentOpts = rlPPOAgentOptions('ExperienceHorizon',expHorizon, 'MiniBatchSize',miniBatchSz, 'NumEpoch',nEpoch, 'ClipFactor', 0.1);
agent = rlPPOAgent(actor,critic,agentOpts);% Train agent
trainOpts = rlTrainingOptions('MaxEpisodes',episodes, 'MaxStepsPerEpisode',episodeSteps, ...
'Verbose',false, 'Plots','training-progress', ...
'StopTrainingCriteria', 'AverageReward', ...
'StopTrainingValue',10);% Run training
trainingStats = train(agent,env,trainOpts);% Simulate
simOptions = rlSimulationOptions('MaxSteps',2000);
experience = sim(env,agent,simOptions);
ANSWER provide latest MatLab Homework Help,MatLab Assignment Help for students, engineers and researchers in Multiple Branches like ECE, EEE, CSE, Mechanical, Civil with 100% output.Matlab Code for B.E, B.Tech,M.E,M.Tech, Ph.D. Scholars with 100% privacy guaranteed. Get MATLAB projects with source code for your learning and research.
So, transfer learning does not work the same in RL as it does in DL. In DL, there are no environment physics that need to be understood. Recall that neural networks are really just non-linear curve fitting tools. In DL the way transfer learning works, is you take a pre-trained feature extraction network. This learns which shapes are useful(lines, circles and so on). You then add some of your own images to the mix and obtain some curve fitting results.
In MATLAB’s current RL framework, we are not extracting information from images using a CNN, we are supplying …