What is S-RL
State representation learning (SRL, Lesort et al., 2018) aims at learning compact representations from raw observations (e.g., learn a position (x, y) directly from raw pixels) without explicit supervision.
Most of these algorithms are designed to learn abstract features that characterize data. The goal is to use that representation to solve a task with RL.
The idea is that a low-dimensional representation should only keep the useful information and reduce the search space, thus contributing to address two main challenges of RL:
- Sample inefficiency
- Instability
Moreover, a state representation learned for a particular task may be transferred to related tasks and therefore speed up learning in multiple task settings.
Using RL notations, SRL corresponds to learning a transformation(In,practice, the learned transformation is a neural network) $ϕ$ from the observation space $O$ to the state space $S$. Then, a policy $\pi$, that takes a state $s_t \in S$ as input and outputs action $a_t$, is learned to solve the task:
$$ o_t \xrightarrow[SRL]{\phi} s_t \xrightarrow[RL]{\pi} a_t $$
Environmental Details
The simulated environments run at 250 FPS on a 8-core machine that allows to train a RL agent on 1 Million steps in only 1h (or to generate 20k samples in less than 2 min)
A ground truth state is defined in each scenario:
- The absolute robot position in static scenarios
- The relative position (w.r.t. the target) in moving goal scenarios
Images are 224x224 pixels, navigation datasets use 4 discrete actions (right, left, forward, backward); robot arms use one more (down) action.
All CNN policies normalize the input image by dividing it by 255.
Observations are not stacked.
when learning from SRL, the states are normalized using a running mean/std average.
Reinforcement learning metrics reported are the average returned rewards over 5 policies, independently trained using the same RL algorithm with a different seed.
Evaluation of Learned State Representations
- Qualitative Evaluation: Which is the perceived utility of the state representation using visualization tools
- Metrics:
- KNN-MSE: A low KNN-MSE means that a neighbor in the ground truth is still a neighbor in the learned representation, and thus, local coherence is preserved
- Correlation:
- GTC for Ground Truth Correlation
- The mean of GTC
RL and ES Algorithms
- A2C: A synchronous, deterministic variant of Asynchronous Advantage Actor Critic (A3C).
- ACER: Sample Efficient Actor-Critic with Experience Replay
- ACKTR: Actor Critic using Kronecker-Factored Trust Region
- ARS: Augmented Random Search
- CMA-ES: Covariance Matrix Adaptation Evolution Strategy
- DDPG: Deep Deterministic Policy Gradients
- DeepQ: DQN and variants (Double, Dueling, prioritized experience replay)
- PPO1: Proximal Policy Optimization (MPI Implementation)
- PPO2: Proximal Policy Optimization (GPU Implementation)
- SAC: Soft Actor Critic
- TRPO: Trust Region Policy Optimization (MPI Implementation)
Training
Before you start a RL experiment, you have to make sure that a visdom server is running, unless you deactivate visualization, add --no-vis
.
Launch visdom server:
python -m visdom.server
Train a agent:
python -m rl_baselines.train --algo rl_algo --env env1 --log-dir logs/ --srl-model raw_pixels --num-timesteps 10000
Continuous Actions
Continuous actions have been implemented for DDPG, PPO2, ARS, CMA-ES, SAC and random agent. To use continuous actions in the position space:
To use continuous actions in the position space:
python -m rl_baselines.train --algo ppo2 --log-dir logs/ -c
To use continuous actions in the joint space:
python -m rl_baselines.train --algo ppo2 --log-dir logs/ -c -joints
Multiple Trainings
Train an agent multiple times on multiple environments, using different methods
python -m rl_baselines.pipeline --algo ppo2 --log-dir logs/ --env env1 env2 [...] --srl-model model1 model2 [...]
python -m rl_baselines.pipeline --algo ppo2 --log-dir logs/ --srl-model vae ground_truth --random-target --num-cpu 4 --num-iteration 15
Enjoy the Trained Agent
To load a trained agent and see the result:
python -m replay.enjoy_baselines --log-dir path/to/trained/agent/ --render
Add Your Own RL Algorithm
- Create a class that inherits
rl_baselines.base_classes.BaseRLObject
which implements your algorithm - Add your class to the
registered_rl
dictionary inrl_baselines/registry.py
, using this formatNAME: (CLASS, ALGO_TYPE, [ACTION_TYPE])
- Now you can call your algorithm using
--algo NAME
withtrain.py
orpipeline.py
Hyperparameter Search
S-RL toolbox allows hyperparameter search, using hyperband or hyperopt for the implemented RL algorithms
python -m rl_baselines.hyperparam_search --optimizer hyperband --algo ppo2 --env MobileRobotGymEnv-v0 --srl-model ground_truth
Available Environments
- Kuka arm
- Mobile robot
- Racing car
- Baxter
- Robobo
- Omnidirectional Robot
To test the environment:
python -m environments.dataset_generator --no-record-data --display
To record data (i.e. generate a dataset) from the environment for training a SRL model, using random actions:
python -m environments.dataset_generator --num-cpu 4 --name folder_name
Add a Custom Environment
- Create a class that inherits
environments.srl_env.SRLGymEnv
which implements your environment - Add this code to the same file as the class declaration
- Add your class to the
registered_env
dictionary inenvironments/registry.py
, using this formatNAME: (CLASS, SUPER_CLASS, PLOT_TYPE, THREAD_TYPE)
- Add the name of the environment to
config/srl_models.yaml
, with the location of the saved model for each SRL model (can point to a dummy location, but must be defined) - Now you can call your environment using -
-env NAME
withtrain.py
,pipeline.py
ordataset_generator.py
S-RL Models
- Look the SRL Repo to learn how to train a state representation model
- Then you must edit
config/srl_models.yaml
- Set the right path to use the learned state representations
To train a Reinforcement learning agent on a specific SRL model:
python -m rl_baselines.train --algo ppo2 --log-dir logs/ --srl-model model_name
Available SRL models
The available state representation models are:
- ground_truth: Hand engineered features (e.g., robot position + target position for mobile robot env)
- raw_pixels: Learning a policy in an end-to-end manner, directly from pixels to actions.
- supervised: A model trained with Ground Truth states as targets in a supervised setting.
- autoencoder: an autoencoder from the raw pixels
- vae: a variational autoencoder from the raw pixels
- inverse: an inverse dynamics model
- forward: a forward dynamics model
- srl_combination: a model combining several losses (e.g. vae + forward + inverse…) for SRL
- pca: pca applied to the raw pixels
- robotic_priors: robotic priors model,.Learning State Representations with Robotic Priors
- multi_view_srl: a SRL model using views from multiple cameras as input, with any of the above losses (e.g triplet and others)
- joints: the arm’s joints angles (kuka environments only)
- joints_position: the arm’s x,y,z position and joints angles (kuka environments only)
Add a Custom SRL Model
If your SRL model is a characteristics of the environment (position, angles, …):
- Add the name of the model to the registered_srl dictionary in
state_representation/registry.py
- Modify the
def getSRLState(self, observation)
in the environments to return the data you want for this model. - Now you can call your SRL model using
--srl-model NAME
withtrain.py
orpipeline.py
Otherwise, for the SRL model that are external to the environment (Supervised, autoencoder, …):
- Add your SRL model that inherits SRLBaseClass, to the function
state_representation.models.loadSRLModel
- Add the name of the model to the
registered_srl
dictionary instate_representation/registry.py
- Add the name of the model to config/srl_models.yaml, with the location of the saved model for each environment (can point to a dummy location, but must be defined).
- Add the name of the environment to
config/srl_models.yaml
, with the location of the saved model for each SRL model (can point to a dummy location, but must be defined) - Now you can call your environment using -
-env NAME
withtrain.py
,pipeline.py
SRL Zoo
A collection of State Representation Learning (SRL) methods for Reinforcement Learning, written using PyTorch.
Available Methods
- SRL with Robotic Priors + extensions (stereo-vision, additional priors)
- Denoising Autoencoder (DAE)
- Variational Autoencoder (VAE) and beta-VAE
- PCA
- Supervised Learning
- Forward, Inverse Models
- Triplet Network (for stereo-vision only)
- Reward loss
- Combination and stacking of methods
- Random Features
Learning a State Representation
To learn a state representation, you need to enforce constrains on the representation using one or more losses.
All losses are defined in losses/losses.py. The available losses are:
- autoencoder: reconstruction loss, using current and next observation
- denoising autoencoder (dae): same as for the auto-encoder, except that the model reconstruct inputs from noisy observations containing a random zero-pixel mask
- vae: (beta)-VAE loss (reconstruction + kullback leiber divergence loss)
- inverse: predict the action given current and next state
- forward: predict the next state given current state and taken action
- reward: predict the reward (positive or not) given current and next state
- priors: robotic priors losses (see “Learning State Representations with Robotic Priors”)
- triplet: triplet loss for multi-cam setting (see Multiple Cameras section)
References
- Raffin, (2018). S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning.
- Lesort, (2018). State Representation Learning for Control: An Overview.
Note: Cover Picture