NOTE: The following materials are presented for timely
dissemination of academic and technical work. Copyright and all other rights
therein are reserved by authors and/or other copyright holders. Persoanl
use of the following materials is permitted and, however, people using
the materials or information are expected to adhere to the terms and
constraints invoked by the related copyright.
Improved Learning Algorithms for Mixture of Experts in Multiclass
Classification
ABSTRACT
Mixture of experts (ME) is a modular neural network architecture
for supervised learning. A double-loop Expectation-Maximization
(EM) algorithm has been introduced to the ME architecture for adjusting
the parameters and the iteratively reweighted least squares (IRLS)
algorithm is used to perform maximization in the inner loop (Jordan &
Jacobs, 1994). However, it is reported in literature that the IRLS
algorithm is of instability and the ME architecture trained by the EM
algorithm, where the IRLS algorithm is used in the inner loop, often produces
the poor performance in multiclass classification. In this paper, the reason
of this instability is explored. We find out that due to an implicitly imposed
incorrect assumption on parameter independence in multiclass classification,
an incomplete Hessian matrix is used in that IRLS algorithm. Based on this
finding, we apply the Newton-Raphson method to the inner loop of the EM
algorithm in the case of multiclass classification, where the exact Hessian
matrix is adopted. To tackle the expensive computation of the Hessian matrix
and its inverse, moreover, we propose an approximation to the Newton-Raphson
algorithm based on a so-called generalized Bernoulli density. The
Newton-Raphson algorithm and its approximation have been applied to synthetic
data, benchmark, and real-world multiclass classification tasks. For
comparison, the IRLS algorithm and a quasi-Newton algorithm called BFGS
have also been applied to the same tasks. Simulation results have shown that
the use of the proposed learning algorithms avoids the instability
problem and makes the ME architecture produce the good performance in
multiclass classification. In particular, our approximation
algorithm leads to fast learning. In addition, the limitation of our
approximation algorithm is also empirically investigated in this paper.
Click
nn99.pdf
for full text