The problem of designing cost functions to estimate a posteriori probabilities in multiclass problems is addressed in this paper. We establish necessary and sufficient conditions that these costs must satisfy in one-class one-output networks whose outputs are consistent with probability laws. We focus our attention on a particular subset of the corresponding cost functions; those which verify two usually interesting properties: symmetry and separability (well-known cost functions, such as the quadratic cost or the cross entropy are particular cases in this subset). Finally, we present a universal stochastic gradient learning rule for single-layer networks, in the sense of minimizing a general version of these cost functions for a wide family of nonlinear activation functions.

}, keywords = {Cost functions, Estimation, Functions, Learning algorithms, Multiclass problems, Neural networks, Pattern recognition, Probability, Problem solving, Random processes, Stochastic gradient learning rule}, issn = {10459227}, doi = {10.1109/72.761724}, url = {http://www.scopus.com/inward/record.url?eid=2-s2.0-0032643080\&partnerID=40\&md5=d528195bd6ec84531e59ddd2ececcd46}, author = {Jes{\'u}s Cid-Sueiro and J I Arribas and S Urban-Munoz and A R Figueiras-Vidal} } @conference {410, title = {Neural networks to estimate ML multi-class constrained conditional probability density functions}, booktitle = {Proceedings of the International Joint Conference on Neural Networks}, year = {1999}, publisher = {IEEE, United States}, organization = {IEEE, United States}, address = {Washington, DC, USA}, abstract = {In this paper, a new algorithm, the Joint Network and Data Density Estimation (JNDDE), is proposed to estimate the {\textquoteleft}a posteriori{\textquoteright} probabilities of the targets with neural networks in multiple classes problems. It is based on the estimation of conditional density functions for each class with some restrictions or constraints imposed by the classifier structure and the use Bayes rule to force the a posteriori probabilities at the output of the network, known here as a implicit set. The method is applied to train perceptrons by means of Gaussian mixture inputs, as a particular example for the Generalized Softmax Perceptron (GSP) network. The method has the advantage of providing a clear distinction between the network architecture and the model of the data constraints, giving network parameters or weights on one side and data over parameters on the other. MLE stochastic gradient based rules are obtained for JNDDE. This algorithm can be applied to hybrid labeled and unlabeled learning in a natural fashion.

}, keywords = {Generalized softmax perceptron (GSP) network, Joint network and data density estimation (JNDDE), Mathematical models, Maximum likelihood estimation, Neural networks, Probability density function, Random processes}, doi = {https://doi.org/10.1109/IJCNN.1999.831174}, url = {http://www.scopus.com/inward/record.url?eid=2-s2.0-0033326060\&partnerID=40\&md5=bb38c144dac0872f3a467dc12170e6b6}, author = {J I Arribas and Jes{\'u}s Cid-Sueiro and T Adali and A R Figueiras-Vidal} }