Mar 11, 2026

Why I Default to Classical Control Before Touching RL

Here's something that might sound counterintuitive coming from someone who works in robot learning: my default when starting a new manipulation task is not reinforcement learning. It's MPC or a well-tuned PID. And I've found that this habit consistently makes me faster and my systems more reliable.

The reason is simple — classical control gives you a baseline you can trust. When you're working on a new task, there are already enough unknowns: the hardware, the environment, the edge cases you haven't encountered yet. Adding an RL agent on top of all that means you're debugging everything simultaneously. A classical baseline lets you isolate the hard parts. If your MPC-controlled arm can complete the task 70% of the time, you now know what the remaining 30% looks like. That's the problem you're actually solving.

In my experience, classical control also survives handoff. When you move a learned policy from one robot to another, or from one shift to another with different lighting conditions, a well-tuned controller often outperforms a policy that was never forced to be robust by design. RL policies can be brittle in ways that only show up weeks later in production.

That said, there are tasks where classical control genuinely cannot compete — anything requiring semantic understanding, unstructured environments, or continuous adaptation to changing conditions. These are the right places for RL or imitation learning. But I've seen too many projects burn weeks on training pipelines for tasks that a classical planner with good perception could have solved in three days.

My process now: build the classical baseline first, measure its failure modes precisely, then decide if the improvement from a learned policy is worth the engineering cost. Usually, the answer tells you exactly where to spend your time.

‹ The Sim-to-Real Gap Is a People Problem, Not a Physics Problem

Domain Randomization: How Much Is Too Much? ›