Learning To Swing Using Reinforcement Learning

In collaboration with Theo Panagiotopoulos

This project is a Final Project of Dr. C. Karen Liu's Computer Animation class.

An off-the-shelf Off-policy Reinforcement Learning method Soft-Actor-Critic has been used to train body to build up the momentum on a pull up bar.
The control policy is conditioned on a state that consists of positions and velocities of all character joints. Policy produces a desired delta position for the following DOF: thigh, shin, toe, bicep, forearm, hand1&2, head.
Target positions are input to the PID controller which operates at a higher frequency (policy skips frames).
Since this an idealistic simulator setting, left and right body parts are "mirrored", i.e. left thigh and right thigh are initilized in the same respective configurations and both get the same control values.
Rewards is continious and given as absolute velocity of pelvis around z axis. NeuralNet consists of 1 layer Fully Connected layer of 256 neurons.

Initial policy (0 mins of training):

Initial policy (15 mins of training):

Initial policy (20 mins of training):