Decision making in autonomous systems is particularly challenging in unknown and changing complex environments, where providing a complete a priori representation is not possible. The so built representation should be the result of the system interactions with the environment. To illustrate the problem, we consider a decentralized control of road traffic, where a control device of the distributed infrastructure locally controls traffic by sending recommendation messages to connected vehicles. We propose an approach able to combine, without prior domain-knowledge, a set of existing traditional unsupervised learning methods that collaborate as a population of agents in order to build an efficient representation. This study addresses the main scientific issues to consider for such a system to efficiently learn. Our approach follows a constructivist learning perspective, where a population of agents is able to collectively build a representation that dynamically combines discretization processes.
constructivist learning, decision-making, control
Alonso E., D’inverno M., Kudenko D., Luck M., Noble J. (2001). Learning in multi-agent systems. The Knowledge Engineering Review, vol. 16, no 03, p. 277–284.
Asada M., Hosoda K., Kuniyoshi Y., Ishiguro H., Inui T., Yoshikawa Y. et al. (2009). Cognitive developmental robotics: a survey. IEEE Transactions on Autonomous Mental Development, vol. 1, no 1, p. 12–34.
Auer P., Cesa-Bianchi N., Fischer P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine learning, vol. 47, no 2-3, p. 235–256.
Bazzan A. L. C. (2009). Opportunities for multiagent systems and multiagent reinforcement learning in traffic control. Autonomous Agents and Multi-Agent Systems, vol. 18, no 3, p. 342- 375.
Bloembergen D., Tuyls K., Hennes D., Kaisers M. (2015). Evolutionary dynamics of multiagent learning: a survey. Journal of Artificial Intelligence Research, vol. 53, p. 659–697.
Drescher G. L. (1991). Made-up minds: a constructivist approach to artificial intelligence. MIT press.
Dresner K., Stone P. (2004). Multiagent traffic management: A reservation-based intersection control mechanism. In Proceedings of the third international joint conference on autonomous agents and multiagent systems-volume 2, p. 530–537.
Farhi N., Phu C. N. V., Amir M., Haj-Salem H., Lebacque J.-P. (2015). A semi-decentralized control strategy for urban traffic. Transportation Research Procedia, vol. 10, p. 41 - 50. Consulté sur http://www.sciencedirect.com/science/article/pii/S2352146515002410 (18th Euro Working Group on Transportation, EWGT 2015, 14-16 July 2015, Delft, The Netherlands)
Fritzke B. (1995). A growing neural gas network learns topologies. In G. Tesauro, D. Touretzky, T. Leen (Eds.), Advances in neural information processing systems 7, p. 625–632. MIT Press.
Fritzke B. (1997). A self-organizing network that can follow non-stationary distributions. In W.
Gerstner, A. Germond, M. Hasler, J.-D. Nicoud (Eds.), Artificial neural networks – icann’97, vol. 1327, p. 613-618. Springer Berlin Heidelberg.
Fulda N., Ventura D. (2007). Predicting and preventing coordination problems in cooperative q-learning systems. In Ijcai, vol. 2007, p. 780–785.
Geroliminis N., Srivastava A., Michalopoulos P. G. (2011). Experimental observations of capacity drop phenomena in freeway merges with ramp metering control and integration in a first-order model. In Transportation research board 90th annual meeting.
Graczyk M., Lasota T., Trawi´nski B., Trawi´nski K. (2010). Comparison of bagging, boosting and stacking ensembles applied to real estate appraisal. In Asian conference on intelligent information and database systems, p. 340–350.
Guériau M. (2016). Systèmes multi-agents, auto-organisation et contrôle par apprentissage constructiviste pour la modélisation et la régulation dans les systèmes coopératifs de trafic. Thèse de doctorat non publiée, Université de Lyon I Claude Bernard.
Guériau M., Billot R., Armetta F., Hassas S., El Faouzi N.-E. (2015). Un simulateur multiagent de trafic coopératif. In 23es journées francophones sur les systèmes multi-agents (jfsma’15), p. 165–174.
Guériau M., Billot R., El Faouzi N.-E., Monteil J., Armetta F., Hassas S. (2016). How to assess the benefits of connected vehicles? a simulation framework for the design of cooperative traffic management strategies. Transportation Research Part C: Emerging Technologies, vol. 67, p. 266 - 279.
Guerin F. (2011). Learning like a baby: a survey of artificial intelligence approaches. The Knowledge Engineering Review, vol. 26, no 02, p. 209–236.
Kesting A., Treiber M., Helbing D. (2007). General lane-changing model mobil for carfollowing models. Transportation Research Record: Journal of the Transportation Research Board, vol. 1999 / 2007 Traffic Flow Theory 2007, p. 86-94.
Kesting A., Treiber M., Schönhof M., Helbing D. (2008). Adaptive cruise control design for active congestion avoidance. Transportation Research Part C: Emerging Technologies, vol. 16, no 6, p. 668–683.
Khondaker B., Kattan L. (2015). Variable speed limit: an overview. Transportation Letters, vol. 7, no 5, p. 264–278.
Leclercq L., Knoop V. L., Marczak F., Hoogendoorn S. P. (2016). Capacity drops at merges: New analytical investigations. Transportation Research Part C: Emerging Technologies, vol. 62, p. 171–181.
Lungarella M., Metta G., Pfeifer R., Sandini G. (2003). Developmental robotics: a survey. Connection Science, vol. 15, no 4, p. 151–190.
Mazac S., Armetta F., Hassas S. (2014). On bootstrapping sensori-motor patterns for a constructivist learning system in continuous environments. In Alife 14: Fourteenth international conference on the synthesis and simulation of living systems.
Meeden L. A., Blank D. S. (2006). Introduction to developmental robotics. Connection Science, vol. 18, no 2, p. 93–96.
Mugan J., Kuipers B. (2007). Learning distinctions and rules in a continuous world through active exploration. In Proceedings of the seventh international conference on epigenetic robotics (epirob-07), p. 101–108.
Najjar A., Reignier P. (2013). Constructivist ambient intelligent agent for smart environments. In Pervasive computing and communications workshops (percom workshops), 2013 ieee international conference on, p. 356–359.
Oudeyer P.-Y., Smith L. B. (2016). How evolution may work through curiosity-driven developmental process. Topics in Cognitive Science, vol. 8, no 2, p. 492–502.
Panait L., Luke S. (2005). Cooperative multi-agent learning: The state of the art. Autonomous agents and multi-agent systems, vol. 11, no 3, p. 387–434.
Papageorgiou M., Diakaki C., Dinopoulou V., Kotsialos A., Wang Y. (2003). Review of road traffic control strategies. Proceedings of the IEEE, vol. 91, no 12, p. 2043–2067.
Piaget J. (1955). The construction of reality in the child. Journal of Consulting Psychology, vol. 19, no 1, p. 77.
Schmidt-Dumont T., Vuuren J. H. van. (2015). Decentralised reinforcement learning for ramp metering and variable speed limits on highways. IEEE Transactions on Intelligent Transportation Systems, vol. 14, no 8, p. 1.
Schönhof M., Treiber M., Kesting A., Helbing D. (2007). Autonomous detection and anticipation of jam fronts from messages propagated by intervehicle communication. Transportation Research Record: Journal of the Transportation Research Board, vol. 1999, no 1, p. 3–12.
Talebpour A., Mahmassani H., Hamdar S. (2013). Speed harmonization: evaluation of effectiveness under congested conditions. Transportation Research Record: Journal of the Transportation Research Board, no 2391, p. 69–79.
Treiber M., Hennecke A., Helbing D. (2000). Congested traffic states in empirical observations and microscopic simulations. Phys. Rev. E, vol. 62, p. 1805–1824.
Vasirani M., Ossowski S. (2009). A market-inspired approach to reservation-based urban road traffic management. In Proceedings of the 8th international conference on autonomous agents and multiagent systems-volume 1, p. 617–624.
Vieira D. C., Adeodato P. J., Goncalves P. M. (2013). A temporal difference gng-based approach for the state space quantization in reinforcement learning environments. In Tools with artificial intelligence (ictai), 2013 ieee 25th international conference on, p. 561–568.
Wei G. (1993). Learning to coordinate actions in multi-agent systems. In Proceedings of the thirteenth international joint conference on artificial intelligence (ijcai-93), p. 311–316.
Zheng Z., Ahn S., Chen D., Laval J. (2013). The effects of lane-changing on the immediate follower: Anticipation, relaxation, and change in driver characteristics. Transportation research part C: emerging technologies, vol. 26, p. 367–379.
Zhu F., Ukkusuri S. V. (2014). Accounting for dynamic speed limit control in a stochastic traffic environment: A reinforcement learning approach. Transportation Research Part C: Emerging Technologies, vol. 41, p. 30 - 47.
Zlatev J., Balkenius C. (2001). Introduction: Why "epigenetic robotics"? In Proceedings of the first conference on epigenetic robotics, vol. 85, p. 1–4.