Systematic Evaluation of Initial States and Exploration–Exploitation Strategies in PID Auto-Tuning:
A Framework‑Driven Approach Applied on Mobile Robots

1University of Michigan–Dearborn

Abstract

PID controllers are ubiquitous in robotics for their simplicity and effectiveness. While Bayesian Optimization (BO) and Differential Evolution (DE) automate gain selection, the influence of initial PID gains and the exploration–exploitation (E–E) balance remains poorly understood. We propose a unified framework that systematically crosses multiple initial-state hypotheses with three E–E strategies — balanced, exploration‑heavy, and exploitation‑heavy — and evaluates them on two mobile‑robot platforms. Results demonstrate that a balanced E–E policy seeded with well‑chosen initial gains produces the fastest convergence (≤1.1 s settling) and 100 % reliability, while DE offers unmatched robustness across all scenarios.


Framework Overview

Block diagram of the proposed PID auto‑tuning framework showing Configurations Generator and Trials Executer modules.
Figure 1. Framework architecture illustrating how initial PID states and exploration–exploitation levels are paired before tuning with BO / DE.

The framework consists of two cooperating modules:

  1. Configurations Generator — forms the Cartesian product of predefined initial PID states and E–E levels, yielding 6 unique configurations per optimizer.
  2. Trials Executer — sequentially runs each configuration on the target robot, closes the optimization loop, and logs performance metrics.

Methodology

Workflow of the Trials Executer block.
Figure 2. Trials Executer workflow for one configuration (trial).

Algorithm 1 – Trials Executer Pseudocode

// Inputs: config, optimizer_type (BO / DE), ST_threshold, constraints
initialize optimizer(config.initial_state)
repeat
    (Kp, Ki, Kd) ← optimizer.suggest()
    (ST, RT, OS) ← run_rotation(Kp, Ki, Kd)   // Execute 90° turn
    if (RT, OS) within constraints then
        optimizer.update(ST)                  // reward
    else
        optimizer.penalize()                  // discourage
until ST ≤ ST_threshold or max_iter reached
return best (Kp, Ki, Kd), ST

The executor stops early when the settling‑time (ST) objective is achieved, drastically reducing iteration count for promising configurations.


Experimental Results

We ran 24 distinct trials (2 initial states × 3 E–E levels × 2 optimizers) on each robot, repeating each trial 10 times for statistical significance. Table 1 summarises the best‑run metrics; Figures 3 & 4 visualise the settling‑time distributions.

RobotE-E ConfigOpt.ST (ms)Conv %RT (ms)OS %Iter.
KDE of settling times – Omnidirectional robot.
Figure 3. Settling‑time KDEs for omnidirectional robot across E–E levels (BO in blue, DE in red).
KDE of settling times – Differential‑drive robot.
Figure 4. Settling‑time KDEs for differential‑drive robot across E–E levels.

Discussion

RQ1 – Exploration vs. Exploitation

  • A balanced policy consistently delivered the fastest settling times and 100 % convergence on both platforms.
  • Exploration‑heavy settings occasionally discovered lower‑cost regions but required extra iterations to refine.
  • Exploitation‑heavy settings favoured DE, which capitalised on good initial gains without global search overhead.

RQ2 – Impact of Initial Gains

  • Initial State 1 (high P, low I/D) accelerated convergence on the differential‑drive robot by ~3 % relative to State 2.
  • The omnidirectional platform proved less sensitive to initial gains, owing to its holonomic kinematics.

RQ3 – BO vs DE

  • BO reached sub‑1.2 s settling in as few as three iterations but exhibited occasional non‑convergence when exploration was restricted.
  • DE achieved perfect reliability across 240 runs, albeit sometimes requiring >50 iterations to equal BO’s best times.

Conclusion

This study underscores the intertwined roles of initial PID gains and exploration‑exploitation strategy in auto‑tuning. A deliberately balanced search seeded with informed gains yields rapid, reliable performance, while DE remains a fail‑safe optimiser under tough conditions. Future work will fuse BO’s sample‑efficiency with DE’s robustness and extend the framework to multi‑objective tasks.


Acknowledgements

This work was supported in part by the National Science Foundation (NSF) CRII: CPS program under grant number 2347294.