Archive for the ‘talks’ Category

h1

Notes on Determinantal Point Processes

September 11, 2017

I’m giving a tutorial on these amazing processes while in Moscow.  The source “book” for this is of course Alex Kulesza and Ben Taskar’s, “Determinantal Point Processes for Machine Learning”, Foundations and Trends® in Machine Learning: Vol. 5: No. 2–3, pp 123-286, 2012.

If you have an undergraduate in mathematics with loads of multi-linear algebra and real analysis, this stuff really is music for the mind.  The connections and results are very cool.  In my view these guys don’t spend enough time in their intro. on gram matrices, which really is the starting point for everything.  In their online video tutorials they got this right, and lead with these results.

There is also a few interesting connections they didn’t mention.  Anyway, I did some additional lecture notes to give some of the key results mentioned in the long article and elsewhere that didn’t make their tutorial slides.

Advertisements
h1

Advanced Methodologies for Bayesian Networks

August 22, 2017

The 3rd Workshop on Advanced Methodologies for Bayesian Networks was run in Kyoto September 20-22, 2017. The workshop was well organised, and the talks were great. Really good invited talks by great speakers!

I’ll be talking about our (with François Petitjean, Nayyar Zaidi and Geoff Webb) recent work with Bayesian Network Classifiers:

Backoff methods for estimating parameters of a Bayesian network

Various authors have highlighted inadequacies of BDeu type scores and this problem is shared in parameter estimation. Basically, Laplace estimates work poorly, at least because setting the prior concentration is challenging. In 1997, Freidman et al suggested a simple backoff approach for Bayesian network classifiers (BNCs). Backoff methods dominate in in n-gram language models, with modified Kneser-Ney smoothing, being the best known, and a Bayesian variant exists in the form of Pitman-Yor process language models from Teh in 2006. In this talk we will present some results on using backoff methods for Bayes network classifiers and Bayesian networks generally. For BNCs at least, the improvements are dramatic and alleviate some of the issues of choosing too dense a network.

Slides are at the AMBN site, here.  Note I spent a bit of time embellishing my slides with some fabulous historical Japanese artwork!

Software for the system is built on the amazing Chordalysis system of François Petitjean, and the code is available as HierarchicalDirichletProcessEstimation.  Boy, Nayyar and François really can do good empirical work!

h1

Visiting and talks at HSE, Moscow

August 20, 2017
Visiting Dimtry Vetrov’s International Lab of Deep Learning and Bayesian Methods at the distinguished Higher School of Economics in Moscow from 11-15th September 2017.  What a great combination, Bayesian methods and deep learning!
HSE_Group_Grill_130917_small

The HSE group at Izia’s Grill, 13/09/17

Left to right are our host Prof Dimtry Vetrov, me, Iliya Tolstikhin, Novi Quadrianto, Maurizio Filippone and our cordinator Nadia Chirkova.  We four in the middle were the invited speakers for the workshop, Mini-Workshop: Stochastic Processes and Probabilistic Models in Machine Learning.  The invited talks by these guys where absolutely first class and the high quality of the Moscow area speakers made for a fascinating afternoon too.
Giving two talks, a tutorial one:   Introduction to Dirichlet Processes and their use, at the workshop.
Assuming the attendee has knowledge of the Poisson, Gamma, multinomial and Dirichlet distributions, this talk will present the basic ideas and theory to understand and use the Dirichlet process and its close relatives, the Pitman-Yor process and the gamma process.  We will first look at some motivating examples.  Then we will look at the non-hierarchical versions of the processes, which are basically infinite parameter vectors.  These have a number of handy properties and have simple, elegant marginal and posterior inference.  Finally, we will look at the hierarchical versions of these processes.  These are fundamentally different.  To understand the hierarchical version we will briefly review some aspects of stochastic process theory and additive distributions.  The hierarchical versions becomes Dirichlet and Gamma distributions (the process part disappears) but the techniques developed for the non-hierarchical process models can be borrowed to develop good algorithms, since the Dirichlet and Gamma are challenging when placed hierarchically.  Slides are here.
And one to the Faculty of Computer Science:  Learning on networks of distributions for discrete data.  The HSE announcement is here.
I will motivate the talk by reviewing some state of the art models for problems like matrix factorisation models for link prediction and tweet clustering.  Then I will review the classes of distributions that can be strung together in networks to generate discrete data.  This allows a rich class of models that, in its simplest form covers things like Poisson matrix factorisation, Latent Dirichlet allocation, and Stochastic block models, but more generally covers complex hierarchical models on network and text data.  The distributions covered includes so-called non-parametric distributions such as the Gamma process.  Accompanying these are a set of collapsing and augmentation techniques that are used to generate fast Gibbs samplers for many models in this class. To complete this picture, turning complex network models into fast Gibbs samplers, I will illustrate our recent methods of doing matrix factorisation with side information (e.g., GloVe word embeddings), done for link prediction, for instance, for citation networks.
h1

MDSS Seminar Series: Doing Bayesian Text Analysis

August 4, 2017

Giving a talk to the Monash Data Science Society on August 28th.  Details here.  Its a historical perspective and motivational talk about doing text and document analysis.  Slides are here.

h1

Lectures: Learning with Graphical Models

July 15, 2017

I’m giving a series of lectures this semester combining graphical models and some elements of nonparametric statistics.  The intent is to build up to the theory of discrete matrix factorisation and its many variations. The lectures start on 27th July and are mostly given weekly.  Weekly details are given in the calendar too.  The slides are on the Monash share drive under “Wray’s Slides” so if you are at Monash, do a search on Google drive to find them.  If you cannot find them, email me for access.

OK lectures over as of 24/10/2017!  Have some other things to prepare.

Variational Algorithms and Expectation-Maximisation, Lecture 6, 19/10/17, Wray Buntine

This week takes up on material not covered last lecture.  For exponential family distributions, working with the mean of Gibbs samples sometimes sometimes corresponds to another algorithm called Expectation-Maximisation. We will look at this in terms of the Kullback-Leibler versions of variational algorithms. The general theory is quite involved, so we will work through it with some examples, like variational auto-encoders, Gaussian mixture models, and extensions to LDA.

No lectures this week, 12th October, as I will be catching up on reviews and completing a journal article. Next week we’ll work through some examples of variational algorithms, including LDA with a HDP, a model whose VA theory has been thoroughly botched up historically.

Gibbs Sampling, Variational Algorithms and Expectation-Maximisation, Lecture 5, 05/10/17, Wray Buntine

Gibbs sampling is the simplest of the Monte Carlo Markov Chain methods, and the easiest to understand. For computer scientists, it is closely related to local search. We will look at the basic justification of Gibbs sampling and see examples of its variations: block Gibbs, augmentation and collapsing. Clever use of these techniques can dramatically improve performance. This gives a rich class of algorithms that, for smaller data sets at least, addresses most problems in learning. For exponential family distributions, taking the mean instead of sampling sometimes corresponds to another algorithm called Expectation-Maximisation. We will look at this in terms of the Kullback-Leibler versions of variational algorithms. We will look at the general theory and some examples, like variational auto-encoders and Gaussian mixture models.

ASIDE: Determinantal Point Processes, one off lecture, 28/09/17, Wray Buntine

Representing objects with feature vectors lets us measure similarity using dot products.  Using this notion, the determinantal point process (DPP) can be introduced as a distribution over objects maximising diversity.  In this tutorial we will explore the DPP with the help of the visual analogies developed by Kulesza and Taskar in their tutorials and their 120 page Foundations and Trends article “Determinantal Point Processes for Machine Learning.” Topics covered are interpretations and definitions, probability operations such as marginalising and conditioning, and sampling.  The tutorial makes great use of the knowledge of matrices and determinants.

No lectures the following two weeks, 14th and 21st September, as I will be on travel.

Basic Distributions and Poisson Processes, Lecture 4, 07/09/17, Wray Buntine

We review the standard discrete distributions, relationships, properties and conjugate distributions.  This includes deriving the Poisson distribution as an infinitely divisible distribution on natural numbers with a fixed rate.  Then we introduce Poisson point processes as a model of stochastic processes.  We show how they behave in both the discrete and continuous case, and how they have both constructive and axiomatic definitions.  The same definitions can be extended to any infinitely divisible distributions, so we use this to introduce the gamma process.  We illustrate Bayesian operations for the gamma process: data likelihoods, conditioning on discrete evidence and marginalising.

Directed and Undirected Independence Models, Lecture 3, 31/08/17, Wray Buntine

We will develop the basic properties and results for directed and undirected graphical models.  This includes testing for independence, developing the corresponding functional form, and understanding probability operations such as marginalising and conditioning.  To complete this section, we will also investigate operations on clique trees, to illustrate the principles.  We will not do full graphical model inference.

Information and working with Independence, Lecture 2, 17/08/17, Wray Buntine

This will continue with information (entropy) left over from the previous lecture.  Then we will look at the definition of independence and the some independence models, including its relationship with causality.  Basic directed and undirected models will be introduced.  Some example problems will be presented (simply) to tie these together:  simple bucket search, bandits, graph colouring and causal reasoning.

No lectures 03/08 (writing for ACML) and 10/08 (attending ICML).

Motivating Probability and Decision Models, Lecture 1, 27/07/17, Wray Buntine

This is an introduction to motivation for using Bayesian methods, these days called “full probability modelling” by the cognoscenti, to avoid prior cultish associations and implications. We will look at modelling, causality, probability as frequency, and axiomatic underpinnings for reasoning, decisions, and belief . The importance of priors and computation form the basis of this.
h1

ICML 2017 paper: Leveraging Node Attributes for Incomplete Relational Data

May 19, 2017

Here is a paper with Ethan Zhao and Lan Du, both of Monash, we’ll present in Sydney.

Relational data are usually highly incomplete in practice, which inspires us to leverage side information to improve the performance of community detection and link prediction. This paper presents a Bayesian probabilistic approach that incorporates various kinds of node attributes encoded in binary form in relational models with Poisson likelihood. Our method works flexibly with both directed and undirected relational networks. The inference can be done by efficient Gibbs sampling which leverages sparsity of both networks and node attributes. Extensive experiments show that our models achieve the state-of-the-art link prediction results, especially with highly incomplete relational data.

As usual, the reviews were entertaining, and some interesting results we didn’t get in the paper.  Its always enlightening doing comparative experiments.

h1

ALTA 2016 Tutorial: Simpler Non-parametric Bayesian Models

April 21, 2017

They recorded my tutorial ran at ALTA late in 2016.

Part 1 and part 2 up on Youtube, about an hour each.