{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T17:37:53Z","timestamp":1771522673703,"version":"3.50.1"},"reference-count":38,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2019,11,2]],"date-time":"2019-11-02T00:00:00Z","timestamp":1572652800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Shaanxi Province Key Research and Development Program of China","award":["2018GY-187"],"award-info":[{"award-number":["2018GY-187"]}]},{"DOI":"10.13039\/501100004750","name":"Aeronautical Science Foundation of China","doi-asserted-by":"publisher","award":["2016ZC53022"],"award-info":[{"award-number":["2016ZC53022"]}],"id":[{"id":"10.13039\/501100004750","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Seed Foundation of Innovation and Creation for Graduate Students in Northwestern Polytechnical University","award":["ZZ2018169"],"award-info":[{"award-number":["ZZ2018169"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Multi-Robot Confrontation on physics-based simulators is a complex and time-consuming task, but simulators are required to evaluate the performance of the advanced algorithms. Recently, a few advanced algorithms have been able to produce considerably complex levels in the context of the robot confrontation system when the agents are facing multiple opponents. Meanwhile, the current confrontation decision-making system suffers from difficulties in optimization and generalization. In this paper, a fuzzy reinforcement learning (RL) and the curriculum transfer learning are applied to the micromanagement for robot confrontation system. Firstly, an improved Q-learning in the semi-Markov decision-making process is designed to train the agent and an efficient RL model is defined to avoid the curse of dimensionality. Secondly, a multi-agent RL algorithm with parameter sharing is proposed to train the agents. We use a neural network with adaptive momentum acceleration as a function approximator to estimate the state-action function. Then, a method of fuzzy logic is used to regulate the learning rate of RL. Thirdly, a curriculum transfer learning method is used to extend the RL model to more difficult scenarios, which ensures the generalization of the decision-making system. The experimental results show that the proposed method is effective.<\/jats:p>","DOI":"10.3390\/info10110341","type":"journal-article","created":{"date-parts":[[2019,11,4]],"date-time":"2019-11-04T04:13:08Z","timestamp":1572840788000},"page":"341","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Fuzzy Reinforcement Learning and Curriculum Transfer Learning for Micromanagement in Multi-Robot Confrontation"],"prefix":"10.3390","volume":"10","author":[{"given":"Chunyang","family":"Hu","sequence":"first","affiliation":[{"name":"School of Computer Engineering, Hubei University of Arts and Science, Xiangyang 441053, China"}]},{"given":"Meng","family":"Xu","sequence":"additional","affiliation":[{"name":"School of Computer Science, Northwestern Polytechnical University, Xi\u2019an 710072, China"}]}],"member":"1968","published-online":{"date-parts":[[2019,11,2]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1030","DOI":"10.1016\/j.ijforecast.2014.08.008","article-title":"Electricity price forecasting: A review of the state-of-the-art with a look into the future","volume":"4","author":"Weron","year":"2014","journal-title":"Int. J. Forecast."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"304","DOI":"10.1109\/TCIAIG.2017.2766218","article-title":"Generator of Feasible and Engaging Levels for Angry Birds","volume":"10","author":"Ferreira","year":"2017","journal-title":"IEEE Trans. Games"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"2166","DOI":"10.1016\/j.measurement.2012.05.030","article-title":"A computer simulation platform for the estimation of measurement uncertainties in dimensional X-ray computed tomography","volume":"45","author":"Hiller","year":"2012","journal-title":"Measurement"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"508","DOI":"10.1126\/science.aam6960","article-title":"Deepstack: Expert level artificial intelligence in heads-up no-limit poker","volume":"356","author":"Moravik","year":"2017","journal-title":"Science"},{"key":"ref_6","first-page":"1","article-title":"Distributed Static and Dynamic Circumnavigation Control with Arbitrary Spacings for a Heterogeneous Multi-robot System","volume":"4","author":"Yao","year":"2018","journal-title":"J. Intell. Robot. Syst."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1109\/TCIAIG.2013.2286295","article-title":"Survey of Real-Time Strategy Game AI Research and Competition in StarCraft","volume":"5","author":"Ontanon","year":"2013","journal-title":"IEEE Trans. Comput. Intell. AI Games"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"338","DOI":"10.1109\/TCIAIG.2015.2487743","article-title":"Multiscale Bayesian Modeling for RTS Games: An Application to StarCraft AI","volume":"8","author":"Synnaeve","year":"2016","journal-title":"IEEE Trans. Comput. Intell. AI Games"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1109\/TNN.2004.842673","article-title":"Reinforcement Learning: An Introduction","volume":"16","author":"Thrun","year":"2005","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1109\/TII.2016.2617464","article-title":"Decoupled Visual Servoing With Fuzzy Q-Learning","volume":"14","author":"Shi","year":"2018","journal-title":"IEEE Trans. Ind. Inform."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1016\/j.ins.2018.01.032","article-title":"An adaptive Decision-making Method with Fuzzy Bayesian Reinforcement Learning for Robot Soccer","volume":"436","author":"Shi","year":"2018","journal-title":"Inform. Sci."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Xu, M., Shi, H., and Wang, Y. (2018, January 6\u20138). Play games using Reinforcement Learning and Artificial Neural Networks with Experience Replay. Proceedings of the 2018 IEEE\/ACIS 17th International Conference on Computer and Information Science (ICIS), Singapore.","DOI":"10.1109\/ICIS.2018.8466428"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"301","DOI":"10.3233\/AIC-140599","article-title":"Modeling and forecasting of electricity spot-prices: Computational intelligence vs. classical econometrics","volume":"3","author":"Cincotti","year":"2014","journal-title":"AI Commun."},{"key":"ref_14","first-page":"1573","article-title":"RLPy. A value-function-based reinforcement learning framework for education and research","volume":"16","author":"Geramifard","year":"2015","journal-title":"J. Mach. Learn. Res."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"2401","DOI":"10.1109\/TCYB.2015.2477810","article-title":"Optimal Output-Feedback Control of Unknown Continuous-Time Linear Systems Using Off-policy Reinforcement Learning","volume":"46","author":"Modares","year":"2016","journal-title":"IEEE Trans. Cybern."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1143","DOI":"10.1137\/S0363012901385691","article-title":"Actor-critic algorithms","volume":"42","author":"Konda","year":"2003","journal-title":"Siam J. Control Optim."},{"key":"ref_17","unstructured":"Patel, P.G., Carver, N., and Rahimi, S. (2011, January 18\u201321). Tuning computer gaming agents using Q-learning. Proceedings of the Computer Science and Information Systems (FedCSIS), Szczecin, Poland."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1109\/TETCI.2018.2823329","article-title":"StarCraft Micromanagement with Reinforcement Learning and Curriculum Transfer Learning","volume":"3","author":"Shao","year":"2018","journal-title":"IEEE Trans. Emerg. Top. Comput. Intell."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1016\/j.neucom.2012.03.013","article-title":"New adaptive momentum algorithm for split-complex recurrent neural networks","volume":"93","author":"Xu","year":"2012","journal-title":"Neurocomputing"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"Lecun","year":"2015","journal-title":"Nature"},{"key":"ref_21","unstructured":"Peng, P., Yuan, Q., Wen, Y., Yang, Y., Tang, Z., Long, H., and Wang, J. (2017). Multi-agent bidirectionally-coordinated nets for learning to play StarCraft combat games. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1345","DOI":"10.1109\/TKDE.2009.191","article-title":"A Survey on Transfer Learning","volume":"22","author":"Pan","year":"2010","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"571","DOI":"10.1016\/j.jcss.2013.06.012","article-title":"Building a reputation-based bootstrapping mechanism for newcomers in collaborative alert systems","volume":"80","author":"Gil","year":"2014","journal-title":"J. Comput. Syst. Sci."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1214\/ss\/1177011077","article-title":"Simulated Annealing","volume":"8","author":"Bertsimas","year":"1993","journal-title":"Stat. Sci."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"37173","DOI":"10.1109\/ACCESS.2018.2847048","article-title":"A Sample Aggregation Approach to Experiences Replay of Dyna-Q Learning","volume":"6","author":"Shi","year":"2018","journal-title":"IEEE Access"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Shi, H., Xu, M., and Hwang, K. (2019). A Fuzzy Adaptive Approach to Decoupled Visual Servoing for a Wheeled Mobile Robot. IEEE Trans. Fuzzy Syst.","DOI":"10.1109\/TFUZZ.2019.2931219"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"8376","DOI":"10.1109\/ACCESS.2018.2808266","article-title":"An Adaptive Strategy Selection Method with Reinforcement Learning for Robotic Soccer Games","volume":"6","author":"Shi","year":"2018","journal-title":"IEEE Access"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Choi, S.Y., Le, T., Nguyen, Q., Layek, M.A., Lee, S., and Chung, T. (2019). Toward Self-Driving Bicycles Using State-of-the-Art Deep Reinforcement Learning Algorithms. Symmetry, 11.","DOI":"10.3390\/sym11020290"},{"key":"ref_29","first-page":"2266","article-title":"RTP-Q: A Reinforcement Learning System with Time Constraints Exploration Planning for Accelerating the Learning Rate","volume":"82","author":"Zhao","year":"1999","journal-title":"IEICE Trans. Fundam. Electron. Commun. Comput. Sci."},{"key":"ref_30","unstructured":"Basyigit, A.I., Ulu, C., and Guzelkaya, M. (2014, January 25\u201327). A New Fuzzy Time Series Model Using Triangular and Trapezoidal Membership Functions. Proceedings of the International Work-Conference On Time Series, Granada, Spain."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1109\/TNNLS.2012.2223824","article-title":"Observer-based adaptive neural network control for nonlinear stochastic systems with time delay","volume":"24","author":"Zhou","year":"2012","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1007\/s10710-014-9224-2","article-title":"Evolving Robocode tanks for Evo Robocode","volume":"15","author":"Harper","year":"2014","journal-title":"Genet. Program. Evol. Mach."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1007\/s10846-008-9299-1","article-title":"Unified Behavior Framework for Reactive Robot Control","volume":"55","author":"Woolley","year":"2009","journal-title":"J. Intell. Robot. Syst."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1023\/A:1013689704352","article-title":"Finite-time Analysis of the Multiarmed Bandit Problem","volume":"47","author":"Auer","year":"2002","journal-title":"Mach. Learn."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Shi, H., and Xu, M. (2019). A Multiple Attribute Decision-Making Approach to Reinforcement Learning. IEEE Trans. Cogn. Dev. Syst.","DOI":"10.1109\/TCDS.2019.2924724"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Shi, H., and Xu, M. (2018, January 18\u201320). A Data Classification Method Using Genetic Algorithm and K-Means Algorithm with Optimizing Initial Cluster Center. Proceedings of the 2018 IEEE International Conference on Computer and Communication Engineering Technology (CCET), Beijing, China.","DOI":"10.1109\/CCET.2018.8542173"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Xu, M., Shi, H., Jiang, K., Wang, L., and Li, X.A. (2019, January 4\u20137). Fuzzy Approach to Visual Servoing with A Bagging Method for Wheeled Mobile Robot. Proceedings of the 2019 IEEE International Conference on Mechatronics and Automation, Tianjin, China.","DOI":"10.1109\/ICMA.2019.8816420"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Shi, H., Xu, M., Hwang, K., and Cai, B.Y. (2019). Behavior Fusion for Deep Reinforcement Learning. ISA Trans.","DOI":"10.1016\/j.isatra.2019.08.054"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/10\/11\/341\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:31:27Z","timestamp":1760189487000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/10\/11\/341"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,11,2]]},"references-count":38,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2019,11]]}},"alternative-id":["info10110341"],"URL":"https:\/\/doi.org\/10.3390\/info10110341","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,11,2]]}}}