Research News
Improved Reinforcement Learning Approach Proposed for Cost Function Optimization in Multi-objective FCS-MPC of PMSM Drives
Editor: LIU Jia | Jun 22, 2026
Print

Model predictive control (MPC) has become a popular strategy for electrical drive systems due to its flexible cost function design and fast dynamic response. The performance of finite-control-set MPC (FCS-MPC) heavily depends on the proper tuning of weighting factors in the cost function, especially when multiple conflicting objectives must be balanced. Trial-and-error or heuristic tuning methods are time-consuming, lack adaptability, and often fail to maintain optimal performance under varying operating conditions.

In a study published in IEEE Transactions on Industrial Electronics, a research team led by Prof. WANG Fengxiang from Fujian Institute of Research on the Structure of Matter of the Chinese Academy of Sciences, along with international collaborators, proposed an improved reinforcement learning (RL) framework to automatically optimize the cost function weighting factors in multi-objective FCS-MPC for permanent magnet synchronous motor (PMSM) drives.

Researchers formulated the weighting factor tuning problem as a continuous-action RL task. They designed a customized cost function to involve multiple conflicting targets as well as the weighting parameters to be optimized. To overcome issues of instability and overestimation bias, they adopted a twin delayed deep deterministic policy gradient (TD3) algorithm which employs dual critic networks, delayed policy updates, and target policy smoothing, significantly improving learning stability and convergence.

The RL agent observed key system states including dq-axis currents, tracking errors and normalized torque and flux deviations, outputting two continuous actions representing weighting factors for flux and switching frequency. A performance-oriented reward function was designed to penalize deviations from torque and flux references. The agent interacted with a high-fidelity PMSM simulation environment under diverse speed and load conditions to learn an optimal policy that balances multiple control objectives without requiring prior expert knowledge or manual intervention.

The proposed TD3-based adaptive weighting algorithm eliminates manual tuning, adapts to varying operating conditions in real time, and achieves a balanced trade-off among multiple conflicting control objectives. It is computationally efficient for online deployment, as the learned policy only requires lightweight forward passes of the neural network, while the offline training is performed once on a standard GPU.

This study provides a practical solution for multi-objective MPC in PMSM drives.