![]() |
Fusion of Neural Networks, Fuzzy Systems and Genetic Algorithms: Industrial Applications
by Lakhmi C. Jain; N.M. Martin CRC Press, CRC Press LLC ISBN: 0849398045 Pub Date: 11/01/98 |
Previous | Table of Contents | Next |
Computation of the Learning Rate
Using a search algorithm, the zero of the function d(W, C) = I - ξ should be calculated [22]. ξ denotes the lower bound of the cost function to be minimized. This bound has to be placed between the actual error I and the minimal achievable error Imin. In order to obtain an algorithm for the learning rate in analogy to Newtons method, the difference d(. . .) is expanded in a Taylor series at the point W0, C0 and terminated after the first term.
One possible solution to this requirement is
This solution for Δwi,j and Δck,l provides a structure in which the second term can be interpreted as a computable learning rate [23].
The bound ξ can be determined as follows [23]. By systematically decreasing the bound, the cost function is brought to a minimum and kept there in the following training steps; thereby, the cost function remains at the minimum.
Initialization by Linearization
In order to compute initial values for the weight and output matrices, the recurrent neural network description is linearized [22]. The nonlinear part of the state space description (11), (12)
has to assume the linear form
at the stationary point . This leads to
for the elements of the weight matrix W*. Considering the stationary point = 0, this method results in W* = W. Therefore, if a linear model exists in state space form of the nonlinear process to be investigated
giving an exact representation of the system behavior at the point of zero energy = u0 = 0, then the neural network with the following choice of weight and output matrices represents a good approximation of the system at this stationary point [22].
Starting from this initialization, further adaption of these matrices during learning leads to substantially improved modeling results.
Since residuals generally cannot be generated fully structured and robust for real processes, more sophisticated residual evaluation techniques have to be applied. Different faults may have similar effects on the residuals but should be attributed to the different causes. On the other hand, varying sizes of the same fault should be recognized as only one fault cause [23]. These issues represent problems which can be solved by advanced classification methods such as fuzzy logic or neural networks.
In order to apply neural networks for residual evaluation, first of all residuals have to exist (Figure 14) [23]. They can either be generated by another neural network as described before or by one of the analytical methods such as observers or parameter estimation.
Before applying the neural network for evaluation of these residuals, the network has to be trained for this task. For this purpose a residual data base and a corresponding fault signature data base have to exist.
Figure 14 General scheme for off-line neural net training and on-line residual evaluation.
After completing training, the neural network can be applied for on-line residual evaluation, deciding whether or not a fault has occured and isolating which one is the propable cause.
The Restricted-Coulomb-Energy Network (RCE-Net) was presented in 1982 by Cooper, Reilly, and Elbaum [37]. The main advantages of this network are the simple and lucid architecture combined with a rapid learning algorithm [19]. While other neural nets have a given fixed number of neurons, the RCE-net adds new neurons depending on the complexity of the underlying problem.
Figure 15 Hidden layer neuron of RCE network.
The RCE-net consists of three layers: an input layer, an internal layer, and an output layer. The input layer is fully connected with the internal layer containing as many neurons as there are input pattern vectors to be classified. The neurons of the internal layer can be described by their input function, their transfer function, and their output function. Each cell of the internal layer is connected to every cell of the input layer. Therefore, all internal cells simultaneously get identical pattern vectors. These pattern vectors are compared to the weight vectors of each neuron by applying a distance metric (Figure 15) [20, 23]. The weight vector is a characteristic of the related cell and is not changed during training. The distance is compared to a threshold, which decides whether or not the neuron fires.
The output layer is sparsely connected to the internal layer; each internal layer neuron projects its output to only one output layer cell. The number of output cells is given by the number of classes to be seperated. The output layer neurons perform a logical OR on its inputs such that, as soon as at least one internal cell is firing, the corresponding output cell will fire.
The training of the RCE-net is performed in a supervised manner, i.e., for each training input pattern, a binary output pattern is given reflecting the categories belonging to this input. Learning can be described by two distinct mechanisms, adaption of thresholds and addition of new cells [23].
Previous | Table of Contents | Next |