Fusion of Neural Networks, Fuzzy Systems and Genetic Algorithms: Industrial Applications Fusion of Neural Networks, Fuzzy Systems and Genetic Algorithms: Industrial Applications
by Lakhmi C. Jain; N.M. Martin
CRC Press, CRC Press LLC
ISBN: 0849398045   Pub Date: 11/01/98
  

Previous Table of Contents Next


where kp and kf are two positive constants and u(k) represents an estimate d^(k) of the actual vector d(k). This function expresses the characteristic of noncausality typical of the coarticulatory phenomenon for which the mouth configuration, at a given time, depends not only on past information (just pronounced phones) but also on future information (phones the speaker is going to utter).

Since vectors x(k) are a discrete representation of the entire articulatory structure (vocal cords excluded), function G( ) must work as a filter charged with processing this complex and aggregating information to extract only those parameters which describe the external mouth appearance.

The Time-Delay Neural Network (TDNN), first proposed by Waibel and subsequently used successfully in the field of phonetic recognition, is naturally suited for the solution of this problem. In contrast to conventional neurons which provide their response to the weighted sum of the current inputs, the TDNN also extends the sum to a finite number of past inputs (neuron delay or memory). In this way the output provided by a given layer depends on the output of the previous layers computed over an extended domain (in our case the time domain) of input values. The particular structure of a TDNN also allows the extension of the classical back-propagation algorithm and its complexity optimization.

The elementary unit of such network is the classic perceptron, where the weighted sum includes not only the current input pattern but also a certain number D of past inputs, as shown in Figure 12. The implemented function is

As shown in Figure 13, a multi-layer perceptron network is composed of a pyramid of these elementary units for providing enhanced temporal characteristics. The first hidden layer concentrates on the temporal information coming from D(1) input patterns. The subsequent layers collect information from temporal windows of increasing size.

It is worth noting that, in this way, less weights are used with respect to those which would be necessary to cover a temporal window of the same extension with a classic multi-layer perceptron network.

Moreover, the lower layers correlate only information close in time, while information coming from an extended temporal domain is integrated by the higher layers of the network. At a given time, the network output (in our architecture, one single neuron) depends on the temporal pyramid developed in the previous layers.

Feeding the network with patterns representative of a dynamic process, the generated output patterns can be put in correspondence to a distinct dynamic process correlated to the input one. The difference between the sequence of patterns, generated by the TDNN and the “target” output sequence, is used to train the network through the back-propagation algorithm.


Figure 12  Scheme of the TDNN perceptron (1063-6528/95$04.00 © 1995 IEEE).

Counter l is referred to the layer which contains the currently examined neuron; in the following, l = 0 will represent the input to the network (which coincides with the pattern it must learn) and l = 1, 2, ....., L-1 the subsequent hidden layers with L being the total number of layers. Each neuron has a finite size buffer used to store the D(l) past inputs; the nonlinear operation performed by the neuron is similar to that performed by classical perceptrons with the difference that the weighted sum is extended to all the stored inputs.


Figure 13  Example of a TDNN architecture composed of one hidden layer with 4 TDNN perceptrons. The memory size of the hidden layer is 3, meaning that its outputs integrate 3 consecutive cepstrum vectors. The memory size of the output layer, with one single TDNN perceptron, is 4. The final output of the network results from the integration of 6 input vectors (a). In this example the pattern-target delay DT is 3 (b) (1063-6528/95$04.00 © 1995 IEEE).

If the status of the j-th neuron belonging to the l-th layer at time t is defined as

the neuron output can be expressed as

Let us examine a simple TDNN with one single hidden layer and describe how the information introduced into the input layer is propagated up to the output layer.


Figure 14  Information flow in a TDNN. The structure has been exploded in time to highlight the extension of the time window (memory) processed by the network (1051-8215/97$10.00 © 1997 IEEE).

Indicating with D(1) the buffer size for neurons in the hidden layer, D(1)+1 input patterns are necessary to originate the first valid output value; in other words, the time window “seen” from the hidden layer includes D(1)+1 time instants. If D(2) is the buffer size for neurons in the output layer, D(2)+1 vectors must be provided by the previous layer before the network is able to produce the first valid output. The time window “seen” from the output layer is therefore extended to D(1)+D(2)+1 time instants.

From these considerations we can imagine a time-exploded representation of the information flow through the network as shown in Figure 14, where the parameter mD(l), indicating the number of past time instants for which the output from the l-th layer affects the network output at any given time, is defined as

Further on, we can notice how the information associated to input patterns is subsequently integrated through the following layers: neurons in the hidden layer sum data coming from groups of three consecutive input patterns, highly correlated with one another, and arrange suitable information for the subsequent layer which therefore operates on an abstract representation of the same data.

The network is inclined to detect, in a very natural way, dynamic information like that associated to transients between stationary status or to sequence dynamics which actually represent cues of key importance in any articulatory analysis.

During the learning and the verifying phases, the TDNN outputs can be compared, sample by sample, to the exactly synchronized target sequence or, alternatively, to its anticipated version obtained after applying a generic shift DT in the future limited to the interval [1, mD(L)]. This means that each instantaneous TDNN output can be made as similar as possible to any anticipated target sample included in the time-window which defines the system memory or, in other words, to any past target sample among those which affect its current value.


Previous Table of Contents Next

Copyright © CRC Press LLC