Archive for August, 2017


Advanced Methodologies for Bayesian Networks

August 22, 2017

The 3rd Workshop on Advanced Methodologies for Bayesian Networks is being run in Kyoto September 20-22, 2017.  I’ll be talking about our (with François Petitjean, Nayyar Zaidi and Geoff Webb) recent work with Bayesian Network Classifiers:

Backoff methods for estimating parameters of a Bayesian network

Various authors have highlighted inadequacies of BDeu type scores and this problem is shared in parameter estimation. Basically, Laplace estimates work poorly, at least because setting the prior concentration is challenging. In 1997, Freidman et al suggested a simple backoff approach for Bayesian network classifiers (BNCs). Backoff methods dominate in in n-gram language models, with modified Kneser-Ney smoothing, being the best known, and a Bayesian variant exists in the form of Pitman-Yor process language models from Teh in 2006. In this talk we will present some results on using backoff methods for Bayes network classifiers and Bayesian networks generally. For BNCs at least, the improvements are dramatic and alleviate some of the issues of choosing too dense a network.

Its built on the amazing Chordalysis system of François Petitjean, and the code is available as HierarchicalDirichletProcessEstimation.  Boy, Nayyar and François really can do good empirical work!


Visiting and talks at HSE, Moscow

August 20, 2017
Visiting Dimtry Vetrov’s International Lab of Deep Learning and Bayesian Methods at the distinguished Higher School of Economics in Moscow from 11-15th September 2017.  What a great combination, Bayesian methods and deep learning!

The HSE group at Izia’s Grill, 13/09/17

Left to right are our host Prof Dimtry Vetrov, me, Iliya Tolstikhin, Novi Quadrianto, Maurizio Filippone and our cordinator Nadia Chirkova.  We four in the middle were the invited speakers for the workshop, Mini-Workshop: Stochastic Processes and Probabilistic Models in Machine Learning.  The invited talks by these guys where absolutely first class and the high quality of the Moscow area speakers made for a fascinating afternoon too.
Giving two talks, a tutorial one:   Introduction to Dirichlet Processes and their use, at the workshop.
Assuming the attendee has knowledge of the Poisson, Gamma, multinomial and Dirichlet distributions, this talk will present the basic ideas and theory to understand and use the Dirichlet process and its close relatives, the Pitman-Yor process and the gamma process.  We will first look at some motivating examples.  Then we will look at the non-hierarchical versions of the processes, which are basically infinite parameter vectors.  These have a number of handy properties and have simple, elegant marginal and posterior inference.  Finally, we will look at the hierarchical versions of these processes.  These are fundamentally different.  To understand the hierarchical version we will briefly review some aspects of stochastic process theory and additive distributions.  The hierarchical versions becomes Dirichlet and Gamma distributions (the process part disappears) but the techniques developed for the non-hierarchical process models can be borrowed to develop good algorithms, since the Dirichlet and Gamma are challenging when placed hierarchically.  Slides are here.
And one to the Faculty of Computer Science:  Learning on networks of distributions for discrete data.  The HSE announcement is here.
I will motivate the talk by reviewing some state of the art models for problems like matrix factorisation models for link prediction and tweet clustering.  Then I will review the classes of distributions that can be strung together in networks to generate discrete data.  This allows a rich class of models that, in its simplest form covers things like Poisson matrix factorisation, Latent Dirichlet allocation, and Stochastic block models, but more generally covers complex hierarchical models on network and text data.  The distributions covered includes so-called non-parametric distributions such as the Gamma process.  Accompanying these are a set of collapsing and augmentation techniques that are used to generate fast Gibbs samplers for many models in this class. To complete this picture, turning complex network models into fast Gibbs samplers, I will illustrate our recent methods of doing matrix factorisation with side information (e.g., GloVe word embeddings), done for link prediction, for instance, for citation networks.

MDSS Seminar Series: Doing Bayesian Text Analysis

August 4, 2017

Giving a talk to the Monash Data Science Society on August 28th.  Details here.  Its a historical perspective and motivational talk about doing text and document analysis.  Slides are here.