I’m giving a series of lectures this semester combining graphical models and some elements of nonparametric statistics. The intent is to build up to the theory of discrete matrix factorisation and its many variations. The lectures start on 27th July and are mostly given weekly. Weekly details are given in the calendar too. The slides are on the Monash share drive under “Wray’s Slides” so if you are at Monash, do a search on Google drive to find them. If you cannot find them, email me for access.

**OK lectures over as of 24/10/2017! Have some other things to prepare.**

**Variational Algorithms and Expectation-Maximisation**, Lecture 6, 19/10/17, Wray Buntine

This week takes up on material not covered last lecture. For exponential family distributions, working with the mean of Gibbs samples sometimes sometimes corresponds to another algorithm called Expectation-Maximisation. We will look at this in terms of the Kullback-Leibler versions of variational algorithms. The general theory is quite involved, so we will work through it with some examples, like variational auto-encoders, Gaussian mixture models, and extensions to LDA.

*No lectures this week, 12th October*, as I will be catching up on reviews and completing a journal article. Next week we’ll work through some examples of variational algorithms, including LDA with a HDP, a model whose VA theory has been thoroughly botched up historically.

**Gibbs Sampling, Variational Algorithms and Expectation-Maximisation**, Lecture 5, 05/10/17, Wray Buntine

Gibbs sampling is the simplest of the Monte Carlo Markov Chain methods, and the easiest to understand. For computer scientists, it is closely related to local search. We will look at the basic justification of Gibbs sampling and see examples of its variations: block Gibbs, augmentation and collapsing. Clever use of these techniques can dramatically improve performance. This gives a rich class of algorithms that, for smaller data sets at least, addresses most problems in learning. For exponential family distributions, taking the mean instead of sampling sometimes corresponds to another algorithm called Expectation-Maximisation. We will look at this in terms of the Kullback-Leibler versions of variational algorithms. We will look at the general theory and some examples, like variational auto-encoders and Gaussian mixture models.

**ASIDE: Determinantal Point Processes**, one off lecture, 28/09/17, Wray Buntine

Representing objects with feature vectors lets us measure similarity using dot products. Using this notion, the determinantal point process (DPP) can be introduced as a distribution over objects maximising diversity. In this tutorial we will explore the DPP with the help of the visual analogies developed by Kulesza and Taskar in their tutorials and their 120 page

*Foundations and Trends* article

“Determinantal Point Processes for Machine Learning.” Topics covered are interpretations and definitions, probability operations such as marginalising and conditioning, and sampling. The tutorial makes great use of the knowledge of matrices and determinants.

*No lectures the following two weeks, 14th and 21st September*, as I will be on travel.

**Basic Distributions and Poisson Processes**, Lecture 4, 07/09/17, Wray Buntine

We review the standard discrete distributions, relationships, properties and conjugate distributions. This includes deriving the Poisson distribution as an infinitely divisible distribution on natural numbers with a fixed rate. Then we introduce Poisson point processes as a model of stochastic processes. We show how they behave in both the discrete and continuous case, and how they have both constructive and axiomatic definitions. The same definitions can be extended to any infinitely divisible distributions, so we use this to introduce the gamma process. We illustrate Bayesian operations for the gamma process: data likelihoods, conditioning on discrete evidence and marginalising.

**Directed and Undirected Independence Models**, Lecture 3, 31/08/17, Wray Buntine

We will develop the basic properties and results for directed and undirected graphical models. This includes testing for independence, developing the corresponding functional form, and understanding probability operations such as marginalising and conditioning. To complete this section, we will also investigate operations on clique trees, to illustrate the principles. We will not do full graphical model inference.

**Information and working with Independence**, Lecture 2, 17/08/17, Wray Buntine

This will continue with information (entropy) left over from the previous lecture. Then we will look at the definition of independence and the some independence models, including its relationship with causality. Basic directed and undirected models will be introduced. Some example problems will be presented (simply) to tie these together: simple bucket search, bandits, graph colouring and causal reasoning.

*No lectures 03/08 (writing for ACML) and 10/08 (attending ICML)*.

**Motivating Probability and Decision Models**, Lecture 1, 27/07/17, Wray Buntine

This is an introduction to motivation for using Bayesian methods, these days called “full probability modelling” by the cognoscenti, to avoid prior cultish associations and implications. We will look at modelling, causality, probability as frequency, and axiomatic underpinnings for reasoning, decisions, and belief . The importance of priors and computation form the basis of this.