Our initial covariance matrix is chosen by inspection. We hope to allow for vast initial expansion of the space, so we select large diagonal elements for this matrix. Now we have an intial guess, initial covariance matrix, and a cost function. We are almost ready to begin searching for the minima.
We note that our function evaluations are computationally expensive, due to the high precision time step required for realistic dynamic simulation and the actual time of each walk (20 second maximum). Due to time constraints we set a maximum number of function evaluations per simulation.
For our optimization process, we typically simulate about 500 times with a population of size 14. We repeat the simulation (replacing initial guess with current optimum) until no noticeable improvement is made.
We notice some expected characteristics in our approach. A good deal of crashing occurs while we span the space, looking for promising regions. We progressively observe more successful simulations as we close in on promising areas of interest in subsequent iterations. Finally, we observe a good deal of crashing when we converge on what we determine as our nominal gait. The nominal gait is very unstable, which will be a topic of interest when we discuss robustness in part 2.
Results and Analysis:
Running 3000 simulations we reach our optimal policy for reducing cost. The parameters converged to produce a gait with interesting characteristics. We will correlate this gait with a physical intuirion for how the cost function should affect the gait.
First it is helpful to take a look at what we start with, here is our (provided) initial policy guess. Cost = 1070.
Now we think about how the cost function should affect new policies and the following gait tendencies:
We now consider the cost function as having three main components, the weights of which will drive our optimization. Minimizing the cost function drives our solution toward:
Crash resistance and stability are considered seperately, becaus the crashing penalty is blind to desired gait tendencies, while the penalties on ground clearance, knee retraction, and torso pitch angle push the solution towards a stable gait by using outside information about ideal characteristics.
Here is our nominal gait: Cost = 248.
The gait shown has clear tendencies towards energy efficiency, shown by the shortened step length and decreased range of torso pitch deflection. Low speed and short step result in large drops in contact force. This allows for lower external forces on each foot, along with lower required torque to maintain the gait.
We observe a compromise in ground clearance. The foot passes concerningly close to the ground. Additionally, shortened step leaves the model susceptible to crashing in the presence of perturbations. This will become relevant in Paart 2, where we work towards finding policies robust to perturbations.
In part 1 we determine a nominal gait, which consists of the policy which minimizes the cost function provided to us. We notice that the cost function drives the gait towards energy efficiency rather tha stability.
By using CMA-ES we likely find lower minimum than other methods such as sqp, active set, and interior point. CMA-ES is generally able to avoid getting caught in local minima, which is important when operating on a rough objective (cost) function manifold, such as this.
CMA-ES (Covariance Matrix Adaptation - Evolution Strategy) is a popular new method in the optimization community. It involves producing a population of evaluation points. Within one iteration, the fittest points from a population are selected, and a new set is created based off a gausian distribution of the fittest points. As a dimension converges towards a locally optimum value, its standard deviation will converge towards zero. When the diagonal elements of the covariance matrix all converge to zero a solution has been determined, this works well on non smooth manifolds, where other methods may get caught on weaker local minima.Type your paragraph here.
Robust Gait Determination: (Part 2)
In part two our problem is very different. We are looking to find a policy which minimizes crashing under disturbances. In other words, we look to maximize stability, or robustness. As mentioned above, the nominal gait is clearly not robust. Low clearance leaves the gait susceptible to crashing due to a foot drag. Small step size restricts the volume of stable center of mass positions.
A New Cost Function
The cost function provided favors energy efficiency and stability. We observe in our nominal gait that eneregy saving tendencies (reducing torque and external foot forces) leaves our model susceptible to disturbances. Alternatively, a high energy consumption gait is more likely to wash out disturbances. We consider a model which moves quickly with its head forward. High torques and accelerations result in large contact forces. Now our model is constantly experiences high contact forces and applying high torques to compensate. We would expect disturbances to be washed out if they are small in comparison to these contact forces an torques.
Therefor we structure the cost function to welcome this transition from low to high energy consumption, while maintaining stability. Here are the changes.
Cost Function Modifications:
Strategy for Modifications
From physical intuition and reading, our desired robust gait will have the following characteristics. Modification of the cost function allows our optimization to converge towards a policy which achieves this criteria.
We are provided a simulator for an unperturbed 20 second simulation. Now we want to train our set on white noise perturbations on the torso. The white noise involves a high frequency random sampling of a gaussian. We center the guassian of our perturbations at zero and alter the standard deviation. Every ten thousandth of a second, we sample a new random perturbation magnitude for the each direction (x, y, and z).
Quantitatively, the torso experiences a series of sequetial .0001 second impulses with magnitude correlated to the standard deviation of our perturbation gaussian. Because a separate randomized force is applied in the three directions, we get randomized orientation for free.
Our goal is to build a policy robust agains large perturbations. To do this we train on smaller perturbations, and use the policies generated from these smaller perturbation simulations as the initial guess for larger perturbation policy optimization.
To be specific, a single policy optimization involves an initial policy, a perturbation variance, and our new cost function. We simulate 500 times. It is important to note that for each simulation test a different policy with a different distribution of white noise perturbations. The magnitude of the perturbation stays the same based off of Monte Carlo Method assumptions, but the order in which they are applied varies each time. After 500 simulations, CMA-ES provides us the least cost policy.
A policy may have been successful during the optimization function evaluation, but this does not mean it will not crash on a separate white noise perturbation of equal magnitude. Therefor we will validate the dependability of the policies determined by how often they succeed given a certain perturbation magnitude.
By the method described above, we develope walking policies by training on a constant perturbation white noise variance. We increase the variance for the next optimization procedure, with the intent of developing a more robust gait.
Below is a performance matrix indicating how each policy performs on a set of simulations at a specific white noise variance. A policy trained on variance N is denoted PN in the matrix (columns). Each row presents a white noise variance N (VN) which was simulated on each policy 25 times. We output the success rate of each policy at each white noise variance. Success is simply defined as not falling over.
White Noise Performance Matrix: New Cost Function
We get some really behavior from the policies we determined. As we move to the right across the matrix we observe an improvement in policy success rate until P17. We will arbitrarily set a success rate of 90% to qualify as a perturbation noise that we can "handle". By this metric, we state that P10 can successfully handle white noise perturbations of magnitude 10N.Policy 15 is also the most successful for 25N perturbations, with a success rate of 44%.
A nice feature of our policy improvement is that we increase performance for higher variance perturbations while maintaining higher success rate at low variances. This makes our gait truly robust. Some gaits may be able to handle high variance white noise perturbations, but fall over when unperturbed!
Now we observe the results derived for the same optimization procedure, but with the original cost function. We expect this to yield a far lower performance.
White Noise Performance Matrix: Original Cost Function
By comparison of the two performance matrices, it is clear that the new cost function allows the optimizer to producce more robust policies.
Here we extend our method for policy optimization to a new form of perturbation, used in this paper. We apply a perturbation of magnitude N, 12 seperate times for 0.4 second increments, ensuring that none of the time intervals overlap. We apply the force on the torso in the horizontal plane, but randomize the orientation of the force. Holding our optimization procedure constant, we again compare the new and old cost function.
12 Impulse Permormance Matrix: New Cost Function
12 Impulse Permormance Matrix: Original Cost Function
Some interesting trends develop between the data. The new cost function allows for meaningful progress to be made when training on policies with larger perturbations. That said, why is the P1 for the ORIGINAL COST FUNCTION so robust against white noise? Also the zero columns must be explained. All of these points will be addressed in the Analysis.
The original cost function sees high performance initially, but after a training on about 1N perturbations we are not able to determine any useful policies during subsequent training. We believe these subsequent zero columns are produced by the presence of significant weights on the torque and force components of the cost. After a certain perturbation level, it is clear that you must use more energy to keep from falling. Due to the optimizer over valuing energy terms, you may end up with a solution that is very unlikely to succeed.
Conversely, these torque and fore considerations are effectively eliminated in the optimization procedure for the new cost policy optimization. This factor, along with other gait modification suggestions and weight changes allows for the walker to explore policies which accumulate high power consumption.
That said, why is the P1 for the original cost so robust against white noise? It is possible that some components of gait stability were overlooked by us, such as the head speed or pitch angle. We changed the desired value for each but never increased the weight. It is possible that these should have been weighted more highly, with different offsets.
This suggests to us that a more robust policy than p15 does exist. In the Appendix, we list a series of methods we attempted to handle higher magnitude perturbations (all of which failed.)
Another consideration is that surely we are restricted by time, if we were to increase the number of function evaluations by even a small scaling factor, better solutions would be found.
A few comments on technical specifics:
We were able to wrap the simulation framework into MATLAB. To do this we wrote a cost function which turns an array of parameter values into the P0 file, then we use the system command to run the simulate executable with the new parameter file. This was nice because we could use the CMA-ES code we used in Assignment One.
Unsuccessful Attempts to Build More Robust Gaits:
Unattempted but Potentially Successful Ideas:
(RABBIT - A 5-Link Walker)
Dynamic Optimization: Assignment 2
Quan Nguyen, Brian Bittner
For Assignment 2, our goal is to determine nominal and robust gaits for a five link walker. We are provided its dynamics and an extensive framework for numerical simulation and animation. Our desire in Part One is to determine a gait for the walker which minimizes the cost function, also provided to us. We call this the walker's nominal gait.
Nominal Gait Determination (Part 1)
To determine a gait which minimizes the cost function, will optimize for a set of parameters which make up our gait policy. These parameters include the following:
There are additional parameters made available for policy optimization, including controller gains and initial launch characteristics, but for now we focus on the above described set.
Now we have a set of gait parameters, and an initial guess which accumulates a cost of about 1070 during a 20 second simulation. We will seek to minimize this cost using a CMA-ES. We hope that by using this optimization tehnique, we will avoid local cost minima which arise due to a rough function evaluation manifold.
To elaborate, the 21 parameters we provide induce a nonlinear of evaluation function of medium-sized dimensionality. Small changes in one parameter can crash the robot, quickly spiking the cost. Because of this manifold roughness, a local minima can be extremely far from the global. Here is a look at why CMA-ES works particularly well. (From Assignment One)