The D4PGTrainer is a reinforcement learning trainer that is composed of two parts: an actor sub graph and a critic sub-graph. Unlike A2C and PPO, the critic graph is created and managed internally by SmartEngine. You only need to supply the actor graph. More...

#include <D4PGTrainer.h>

Inheritance diagram for SmartEngine::ID4PGTrainer:

Public Member Functions
virtual float	GetActorLoss ()=0
	Returns the loss in the actor graph. More...

virtual float	GetCriticLoss ()=0
	Returns the loss in the critic graph. More...

Public Member Functions inherited from SmartEngine::IRLTrainer
virtual int	GetGenerationCount () const =0
	Returns how many generations we have trained More...

virtual float	GetLoss ()=0
	This value will mean different things to different trainers. See each trainer's description for the value returned. More...

virtual void	Reset ()=0
	Resets the trainer to a fresh state, initializing any internal weights to random values. More...

virtual void	Step ()=0
	Step training. May not actual result in any training if there is not enough data available yet. More...

Public Member Functions inherited from SmartEngine::IObject
virtual ObjectId	GetId () const =0
	Returns the ID of this object. More...

virtual void	AddRef () const =0
	Increments the internal reference count on this object. It is not common to use this method directly. More...

virtual void	Release () const =0
	Decrements the internal reference count on this object. It is not common to use this method directly. More...

virtual int	GetRefCount () const =0
	Returns the number of references to this object. More...

virtual void *	QueryInterface (ObjectClassId id)=0
	Queries the object for an interface and returns a pointer to that interface if found. More...

void	operator= (IObject const &x)=delete

Public Member Functions inherited from SmartEngine::IAgentFactory
virtual ObjectPtr< IAgent >	CreateAgent ()=0
	Creates an agent for a particular trainer. More...

Public Member Functions inherited from SmartEngine::IResource
virtual const char *	GetResourceName () const =0
	Returns the name of this resource passed to the constructor. More...

virtual SerializationResult	GetLastLoadResult () const =0
	Returns the result of the last call to Load(). Useful for checking loaded data state after creation. More...

virtual SerializationResult	Load (const char *appendName=nullptr)=0
	Load this object from disk. More...

virtual SerializationResult	Save (const char *appendName=nullptr)=0
	Save this object to disk. More...

Public Member Functions inherited from SmartEngine::ISerializable
virtual SerializationResult	Serialize (IMemoryBuffer *buffer)=0
	Write the contents of this object to a buffer. More...

virtual SerializationResult	Deserialize (IMemoryBuffer *buffer)=0
	Fill this object with contents from a buffer. More...

Additional Inherited Members
Public Attributes inherited from SmartEngine::IObject
	private

	__pad0__: IObject() {} IObject(IObject const&) = delete

Detailed Description

The D4PGTrainer is a reinforcement learning trainer that is composed of two parts: an actor sub graph and a critic sub-graph. Unlike A2C and PPO, the critic graph is created and managed internally by SmartEngine. You only need to supply the actor graph.

The main difference between D4PG and A2C / PPO is that the critic does not output a single expected rewards value, but rather a probability distribution of them. D4PG is also off-policy, which means data you collect is saved and can be trained on at any time instead of always relying on recent data.

D4PG only works on continuous actions and not discrete.

It is recommended that you override the network output when you first start training and use / record random values instead. D4PG is not a probabilty distribution, so manually adding a bit of randomness helps keep the entropy high in the beginning and thus produce a good set of training data (and thus speeding up training times). The amount of randomness should reduce to 0 as training goes it. A good place to start is to taper off linearly over the course of 80,000 - 100,000 generations.

GetLoss() returns the actor loss.

Member Function Documentation

◆ GetActorLoss()

virtual float SmartEngine::ID4PGTrainer::GetActorLoss ( )

pure virtual

Returns the loss in the actor graph.

◆ GetCriticLoss()

virtual float SmartEngine::ID4PGTrainer::GetCriticLoss ( )

pure virtual

Returns the loss in the critic graph.

Public Member Functions

Additional Inherited Members

Detailed Description

Member Function Documentation

◆ GetActorLoss()

◆ GetCriticLoss()