Reinforcement learning with internal expectation for the random neural network

Date

2000-10-01

Author

Halıcı, Uğur

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

178
views

0
downloads

The reinforcement learning scheme proposed in Halici (1977) (Halici, U., 1997. Journal of Biosystems 40 (1/2), 83-91) for the random neural network (Gelenbe, E., 1989b. Neural Computation 1 (4), 502-510) is based on reward and performs well for stationary environments. However: when the environment is not stationary it suffers from getting stuck to the previously learned action and extinction is not possible. In this paper, the reinforcement learning scheme is extended by introducing a weight update rule which takes into consideration the internal expectation of reinforcement. With the proposed scheme, the system behaves as in learning with reward when the reward for the learned action is not below the internal expectation, otherwise it behaves as in learning with punishment so that other possibilities can be explored. Such a scheme has made extinction possible while resulting in a good convergence to the most rewarding action.

Subject Keywords

Management Science and Operations Research, Modelling and Simulation, Information Systems and Management

URI

https://hdl.handle.net/11511/42604

Journal

European Journal Of Operational Research

DOI

https://doi.org/10.1016/s0377-2217(99)00479-8

Collections

Department of Electrical and Electronics Engineering, Article

Suggestions

OpenMETU
Core

Reinforcement learning with internal expectation in the random neural networks for cascaded decisions Halıcı, Uğur (Elsevier BV, 2001-10-16) The reinforcement learning scheme proposed in Halici (J. Biosystems 40 (1997) 83) for the random neural network (RNN) (Neural Computation 1 (1989) 502) is based on reward and performs well for stationary environments. However, when the environment is not stationary it suffers from getting stuck to the previously learned action and extinction is not possible. To overcome the problem, the reinforcement scheme is extended in Halici (Eur. J. Oper. Res., 126(2000) 288) by introducing a new weight update rule (E-...
Multi-objective integer programming: A general approach for generating all non-dominated solutions Oezlen, Melih; Azizoğlu, Meral (Elsevier BV, 2009-11-16) In this paper we develop a general approach to generate all non-dominated solutions of the multi-objective integer programming (MOIP) Problem. Our approach, which is based on the identification of objective efficiency ranges, is an improvement over classical epsilon-constraint method. Objective efficiency ranges are identified by solving simpler MOIP problems with fewer objectives. We first provide the classical epsilon-constraint method on the bi-objective integer programming problem for the sake of comple...
A NEW HEURISTIC APPROACH FOR THE MULTIITEM DYNAMIC LOT-SIZING PROBLEM KIRCA, O; KOKTEN, M (Elsevier BV, 1994-06-09) In this paper a framework for a new heuristic approach for solving the single level multi-item capacitated dynamic lot sizing problem is presented. The approach uses an iterative item-by-item strategy for generating solutions to the problem. In each iteration a set of items are scheduled over the planning horizon and the procedure terminates when all items are scheduled. An algorithm that implements this approach is developed in which in each iteration a single item is selected and scheduled over the planni...
Approximate queueing models for capacitated multi-stage inventory systems under base-stock control Avşar, Zeynep Müge (Elsevier BV, 2014-07-01) A queueing analysis is presented for base-stock controlled multi-stage production-inventory systems with capacity constraints. The exact queueing model is approximated by replacing some state-dependent conditional probabilities (that are used to express the transition rates) by constants. Two recursive algorithms (each with several variants) are developed for analysis of the steady-state performance. It is analytically shown that one of these algorithms is equivalent to the existing approximations given in ...
THE SCHEDULING OF ACTIVITIES TO MAXIMIZE THE NET PRESENT VALUE OF PROJECTS - COMMENT SEPIL, C (Elsevier BV, 1994-02-24) In a recent paper, Elmaghraby and Herroelen have presented an algorithm to maximize the present value of a project. Here, with the help of an example, it is shown that the algorithm may not find the optimal solution.

Citation Formats

U. Halıcı, “Reinforcement learning with internal expectation for the random neural network,” European Journal Of Operational Research, pp. 288–307, 2000, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/42604.