A Bernoulli mixture model for word categorisation

González, J.; Juan, A.; Dupont, Pierre; Vidal, E.; Casacuberta, F.

DIAL.pr - BOREAL

Accès à distance ? S'identifier sur le proxy UCLouvain

A Bernoulli mixture model for word categorisation

Primary tabs

González, J. [Universidad politécnica de Valencia] Juan, A. [Universidad politécnica de Valencia] Dupont, Pierre [Université Jean Monnet] Vidal, E. [Universidad politécnica de Valencia] Casacuberta, F. [Universidad politécnica de Valencia]

The problem of word categorisation is formulated as one of unsupervised mixture modelling where Bernoulli distributions capture contextual information. We detail how the free parameters of the mixture models can be estimated through an EM procedure. A deterministic word-to-class mapping is derived from this model using a hierarchical clustering algorithm. Categorisation plays an important role in language modelling. It let us reduce the number of free parameters to be estimated and allow us to easily increase the vocabulary of the task without the need for retraining. In this paper, we try to solve the word-class selection problem by means of a non-supervised method which uses contextual information of the words in the training set together with an adequate distance measure. This paper describes a technique to build a word hierarchical structure through an efficient agglomerative hierarchical clustering algorithm, in a syntax-constrained task. This way, assigning words to categories seems to be an easy job since breaking this structure wherever you want gives you a division of the vocabulary words into categories. We call this algorithm efficient becauses it uses minheaps in order to avoid an extensive search of the nearest neighbour of each sample. Methods for a good codification of the words, based on the words usually around them in the sentences of the task, are described and experiments in order to tune some essential representation and algorithm-dependent parameters were carried out. Finally, subjectively good results were achieved and the reason for calling them subjective is that the only way to evaluate the results is looking at the obtained structure and giving her a mark.

metadata

Document type	Communication à un colloque (Conference Paper) – Présentation orale avec comité de sélection
Access type	Accès restreint
Publication date	2001
Language	Anglais
Conference	"Proceedings of Simposium Nacional de Reconocimiento de Formas y Análisis de Imágenes", Benicàssim (Spain) (du 14/05/2001 au 18/05/2001)
Peer reviewed	yes
Affiliations	Universidad politécnica de Valencia - Instituto tecnologico de informatica Université Jean Monnet - Eurise UCL - SST/ICTM/INGI - Pôle en ingénierie informatique
Keywords	1162
Links	http://biblio.info.ucl.ac.be/2001/272740.pfd http://hdl.handle.net/2078.1/108944[Handle]

Bibliographic reference	González, J. ; Juan, A. ; Dupont, Pierre ; Vidal, E. ; Casacuberta, F.. A Bernoulli mixture model for word categorisation.Proceedings of Simposium Nacional de Reconocimiento de Formas y Análisis de Imágenes (Benicàssim (Spain), du 14/05/2001 au 18/05/2001).
Permanent URL	http://hdl.handle.net/2078.1/108944

User menu

A Bernoulli mixture model for word categorisation

Primary tabs

Footer Help

Languages

Footer menu

User menu

Search form

You are here

A Bernoulli mixture model for word categorisation

Primary tabs

Footer Help

Languages

Footer menu