Simulation:

Our initial covariance matrix is chosen by inspection. We hope to allow for vast initial expansion of the space, so we select large diagonal elements for this matrix. Now we have an intial guess, initial covariance matrix, and a cost function. We are almost ready to begin searching for the minima.

We note that our function evaluations are computationally expensive, due to the high precision time step required for realistic dynamic simulation and the actual time of each walk (20 second maximum). Due to time constraints we set a maximum number of function evaluations per simulation.

For our optimization process, we typically simulate about 500 times with a population of size 14. We repeat the simulation (replacing initial guess with current optimum) until no noticeable improvement is made.

We notice some expected characteristics in our approach. A good deal of crashing occurs while we span the space, looking for promising regions. We progressively observe more successful simulations as we close in on promising areas of interest in subsequent iterations. Finally, we observe a good deal of crashing when we converge on what we determine as our nominal gait. The nominal gait is very unstable, which will be a topic of interest when we discuss robustness in part 2.

Results and Analysis:

Running 3000 simulations we reach our optimal policy for reducing cost. The parameters converged to produce a gait with interesting characteristics. We will correlate this gait with a physical intuirion for how the cost function should affect the gait.

First it is helpful to take a look at what we start with, here is our (provided) initial policy guess. Cost = 1070.

Now we think about how the cost function should affect new policies and the following gait tendencies:

- A high crash penalty guides our optimization to iterate on sets of stable gaits.
- Penalties on torque and foot forces drive us toward an energy efficient gait.
- This will likely

- Penalties on ground clearance, knee retraction, and torso pitch angle provide guides towards a stable trajectory. A compromise between this stability and energy efficiency will have to be made.

We now consider the cost function as having three main components, the weights of which will drive our optimization. Minimizing the cost function drives our solution toward:

- WeightA * Crash Resistance
- WeightB * Energy Efficiency
- WeightC * Stability

Crash resistance and stability are considered seperately, becaus the crashing penalty is blind to desired gait tendencies, while the penalties on ground clearance, knee retraction, and torso pitch angle push the solution towards a stable gait by using outside information about ideal characteristics.

Here is our nominal gait: Cost = 248.

The gait shown has clear tendencies towards energy efficiency, shown by the shortened step length and decreased range of torso pitch deflection. Low speed and short step result in large drops in contact force. This allows for lower external forces on each foot, along with lower required torque to maintain the gait.

We observe a compromise in ground clearance. The foot passes concerningly close to the ground. Additionally, shortened step leaves the model susceptible to crashing in the presence of perturbations. This will become relevant in Paart 2, where we work towards finding policies robust to perturbations.

Conclusion:

In part 1 we determine a nominal gait, which consists of the policy which minimizes the cost function provided to us. We notice that the cost function drives the gait towards energy efficiency rather tha stability.

By using CMA-ES we likely find lower minimum than other methods such as sqp, active set, and interior point. CMA-ES is generally able to avoid getting caught in local minima, which is important when operating on a rough objective (cost) function manifold, such as this.

CMA-ES (Covariance Matrix Adaptation - Evolution Strategy) is a popular new method in the optimization community. It involves producing a population of evaluation points. Within one iteration, the fittest points from a population are selected, and a new set is created based off a gausian distribution of the fittest points. As a dimension converges towards a locally optimum value, its standard deviation will converge towards zero. When the diagonal elements of the covariance matrix all converge to zero a solution has been determined, this works well on non smooth manifolds, where other methods may get caught on weaker local minima.Type your paragraph here.

Robust Gait Determination: (Part 2)

In part two our problem is very different. We are looking to find a policy which minimizes crashing under disturbances. In other words, we look to maximize stability, or robustness. As mentioned above, the nominal gait is clearly not robust. Low clearance leaves the gait susceptible to crashing due to a foot drag. Small step size restricts the volume of stable center of mass positions.

A New Cost Function

The cost function provided favors energy efficiency and stability. We observe in our nominal gait that eneregy saving tendencies (reducing torque and external foot forces) leaves our model susceptible to disturbances. Alternatively, a high energy consumption gait is more likely to wash out disturbances. We consider a model which moves quickly with its head forward. High torques and accelerations result in large contact forces. Now our model is constantly experiences high contact forces and applying high torques to compensate. We would expect disturbances to be washed out if they are small in comparison to these contact forces an torques.

Therefor we structure the cost function to welcome this transition from low to high energy consumption, while maintaining stability. Here are the changes.

Cost Function Modifications:

- Motor Torque Consumption * 0.0001
- Knee Retraction * 100
- Head Movement (Desired head movement scaled up by 1.3)
- Inadequate Foot-Ground Clearance * 100
- Normal Force on Foot (x direction) * 0.0001
- Normal Force on Foot (z direction) * 0.0001
- Torso Pitch Angle (Desired pitch shifted 30 degrees forward.)
- Early Crash Time * 100

Strategy for Modifications

From physical intuition and reading, our desired robust gait will have the following characteristics. Modification of the cost function allows our optimization to converge towards a policy which achieves this criteria.

- Large Step
- This is likely the most important characteristic of our robust gait. When the center of mass of the walker falls outside a space controlled by the feet, the walker cannot maintain its gait. A wider stance or step length allows for a larger region of controllability. Perturbations are less likely to knock the COM out of control.

- High Energy Consumption
- Large contact forces and internal torques produce what we will term a high energy gait. With these larger forces in play, a perturbation is much more likely to have negligible effect. If we can find a gait which is high energy and follows general stable walking guidelines (clearance, head jerk, knee retraction) we should see a great increase in stability from the nominal gait.
- We multilpy all energy penalties by 0.0001 to eliminate their influence on the final gait. We do not eliminate the penalty entirely as it provides more information to the optimizer. This could allow for searching in more areas of interest along the manifold.

- Large contact forces and internal torques produce what we will term a high energy gait. With these larger forces in play, a perturbation is much more likely to have negligible effect. If we can find a gait which is high energy and follows general stable walking guidelines (clearance, head jerk, knee retraction) we should see a great increase in stability from the nominal gait.
- High Speed
- High speed will complement our high energy desires, we increase the desired head speed by 30% to help allow for this transition.

- Head Forward
- In walking, the torso pitch is correlated with step length and speed. By pushing the desired pitch forward by 30 degrees, we hope to see increases in speed and step length.

- In walking, the torso pitch is correlated with step length and speed. By pushing the desired pitch forward by 30 degrees, we hope to see increases in speed and step length.
- Stable Trajectory Characteristics
- We multiply our clearance, knee retraction, and crash penalties by 100. Our gait should be high energgy, but it can't fall over either!

- We multiply our clearance, knee retraction, and crash penalties by 100. Our gait should be high energgy, but it can't fall over either!

Simulate Perturbations:

We are provided a simulator for an unperturbed 20 second simulation. Now we want to train our set on white noise perturbations on the torso. The white noise involves a high frequency random sampling of a gaussian. We center the guassian of our perturbations at zero and alter the standard deviation. Every ten thousandth of a second, we sample a new random perturbation magnitude for the each direction (x, y, and z).

Quantitatively, the torso experiences a series of sequetial .0001 second impulses with magnitude correlated to the standard deviation of our perturbation gaussian. Because a separate randomized force is applied in the three directions, we get randomized orientation for free.

Our goal is to build a policy robust agains large perturbations. To do this we train on smaller perturbations, and use the policies generated from these smaller perturbation simulations as the initial guess for larger perturbation policy optimization.

To be specific, a single policy optimization involves an initial policy, a perturbation variance, and our new cost function. We simulate 500 times. It is important to note that for each simulation test a different policy with a different distribution of white noise perturbations. The magnitude of the perturbation stays the same based off of Monte Carlo Method assumptions, but the order in which they are applied varies each time. After 500 simulations, CMA-ES provides us the least cost policy.

A policy may have been successful during the optimization function evaluation, but this does not mean it will not crash on a separate white noise perturbation of equal magnitude. Therefor we will validate the dependability of the policies determined by how often they succeed given a certain perturbation magnitude.

Results:

By the method described above, we develope walking policies by training on a constant perturbation white noise variance. We increase the variance for the next optimization procedure, with the intent of developing a more robust gait.

Below is a performance matrix indicating how each policy performs on a set of simulations at a specific white noise variance. A policy trained on variance N is denoted PN in the matrix (columns). Each row presents a white noise variance N (VN) which was simulated on each policy 25 times. We output the success rate of each policy at each white noise variance. Success is simply defined as not falling over.

**White Noise Performance Matrix: New Cost Function**

We get some really behavior from the policies we determined. As we move to the right across the matrix we observe an improvement in policy success rate until P17. **We will arbitrarily set a success rate of 90% to qualify as a perturbation noise that we can "handle". By this metric, we state that P10 can successfully handle white noise perturbations of magnitude 10N.**Policy 15 is also the most successful for 25N perturbations, with a success rate of 44%.

A nice feature of our policy improvement is that we increase performance for higher variance perturbations while maintaining higher success rate at low variances. This makes our gait truly robust. Some gaits may be able to handle high variance white noise perturbations, but fall over when unperturbed!

Now we observe the results derived for the same optimization procedure, but with the original cost function. We expect this to yield a far lower performance.

**White Noise Performance Matrix: Original Cost Function**

By comparison of the two performance matrices, it is clear that the new cost function allows the optimizer to producce more robust policies.

Additional Testing:

Here we extend our method for policy optimization to a new form of perturbation, used in this paper. We apply a perturbation of magnitude N, 12 seperate times for 0.4 second increments, ensuring that none of the time intervals overlap. We apply the force on the torso in the horizontal plane, but randomize the orientation of the force. Holding our optimization procedure constant, we again compare the new and old cost function.

**12 Impulse Permormance Matrix: New Cost Function**

**12 Impulse Permormance Matrix: Original Cost Function**

Some interesting trends develop between the data. The new cost function allows for meaningful progress to be made when training on policies with larger perturbations. That said, why is the P1 for the ORIGINAL COST FUNCTION so robust against white noise? Also the zero columns must be explained. All of these points will be addressed in the Analysis.

Analysis:

The original cost function sees high performance initially, but after a training on about 1N perturbations we are not able to determine any useful policies during subsequent training. We believe these subsequent zero columns are produced by the presence of significant weights on the torque and force components of the cost. After a certain perturbation level, it is clear that you must use more energy to keep from falling. Due to the optimizer over valuing energy terms, you may end up with a solution that is very unlikely to succeed.

Conversely, these torque and fore considerations are effectively eliminated in the optimization procedure for the new cost policy optimization. This factor, along with other gait modification suggestions and weight changes allows for the walker to explore policies which accumulate high power consumption.

That said, why is the P1 for the original cost so robust against white noise? It is possible that some components of gait stability were overlooked by us, such as the head speed or pitch angle. We changed the desired value for each but never increased the weight. It is possible that these should have been weighted more highly, with different offsets.

This suggests to us that a more robust policy than p15 does exist. In the Appendix, we list a series of methods we attempted to handle higher magnitude perturbations (all of which failed.)

Another consideration is that surely we are restricted by time, if we were to increase the number of function evaluations by even a small scaling factor, better solutions would be found.

Additional Info:

A few comments on technical specifics:

Implemenation:

We were able to wrap the simulation framework into MATLAB. To do this we wrote a cost function which turns an array of parameter values into the P0 file, then we use the system command to run the simulate executable with the new parameter file. This was nice because we could use the CMA-ES code we used in Assignment One.

Unsuccessful Attempts to Build More Robust Gaits:

- Push the pitch angle forward (alot)
- There is an interesting phenomena which occurs when a leg swings forward and the torso tilts back. The deviation from the desired pitch and the p term of the torso angle controller is high enough such that the torso swings forward to fast. The front leg has not touched the ground yet. This causes the COM to fly ahead of the feet contact points.

- There is an interesting phenomena which occurs when a leg swings forward and the torso tilts back. The deviation from the desired pitch and the p term of the torso angle controller is high enough such that the torso swings forward to fast. The front leg has not touched the ground yet. This causes the COM to fly ahead of the feet contact points.
- Increase the Head Speed
- This was thought to be highly correlated with the total speed of the gait. This may be the case, but the unstability caused by constant acceleration of its high COM makes it a detriment to stability after a point.

- Decrease Swing Time
- Decreasing swing time was thought to speed up the robot. What resulted was a tendency towards swings that looked more like high kicks, delaying its approach towards the stance phase of the gait.

- Introduce Stance Time as a Separate Parameter
- This resulted in a lot of useless chaos. It seemed like stance and swing time should not be identical, but it seems for this implementation they need to be.

Unattempted but Potentially Successful Ideas:

- Introduce more parameters such as the gains for the pid controls.
- Adjustment of these pid terms would likely prevent the issues caused with the forward tilting pitch angle. A decrease in the p term would likely make the gait more stable.

- Place a negative cost on energy consumption and cap the number of function evaluations.
- We know that a robust gait will be high in energy consumption. We could direct our optimization in this direction by using this negative cost. Now we have an optimization problem with no global minimum, so a cap on the number of function evaluations will be needed.
- This may not directly find a robust solution, but may be useful for having a very useul initial guess.

**(RABBIT - A 5-Link Walker)**

Dynamic Optimization: Assignment 2

Quan Nguyen, Brian Bittner

For Assignment 2, our goal is to determine nominal and robust gaits for a five link walker. We are provided its dynamics and an extensive framework for numerical simulation and animation. Our desire in Part One is to determine a gait for the walker which minimizes the cost function, also provided to us. We call this the walker's nominal gait.

Cost Function

- Motor Torque Consumption
- Knee Retraction
- Head Movement
- Inadequate Foot-Ground Clearance
- Normal Force on Foot (x direction)
- Normal Force on Foot (z direction)
- Torso Pitch Angle
- Early Crash Time

Nominal Gait Determination (Part 1)

To determine a gait which minimizes the cost function, will optimize for a set of parameters which make up our gait policy. These parameters include the following:

- Initial Thrust
- Swing Time
- Knots Containing Position, Velocity, and Accelertion
- Knots are components of splines, the are the points at which the polynomial fits are smoothly connected. These provide a strong guide for the gait trajectory.

- Knots are components of splines, the are the points at which the polynomial fits are smoothly connected. These provide a strong guide for the gait trajectory.
- Desired Torso Pitch
- Stance Torque

There are additional parameters made available for policy optimization, including controller gains and initial launch characteristics, but for now we focus on the above described set.

Optimization Method

Now we have a set of gait parameters, and an initial guess which accumulates a cost of about 1070 during a 20 second simulation. We will seek to minimize this cost using a CMA-ES. We hope that by using this optimization tehnique, we will avoid local cost minima which arise due to a rough function evaluation manifold.

To elaborate, the 21 parameters we provide induce a nonlinear of evaluation function of medium-sized dimensionality. Small changes in one parameter can crash the robot, quickly spiking the cost. Because of this manifold roughness, a local minima can be extremely far from the global. Here is a look at why CMA-ES works particularly well. (From Assignment One)