The HSE group at Izia’s Grill, 13/09/17

Left to right are our host Prof Dimtry Vetrov, me, Iliya Tolstikhin, Novi Quadrianto, Maurizio Filippone and our cordinator Nadia Chirkova. We four in the middle were the invited speakers for the workshop,

Mini-Workshop: Stochastic Processes and Probabilistic Models in Machine Learning. The invited talks by these guys where absolutely first class and the high quality of the Moscow area speakers made for a fascinating afternoon too.

Giving two talks, a tutorial one: **Introduction to Dirichlet Processes and their use, **at the workshop.

Assuming the attendee has knowledge of the Poisson, Gamma, multinomial and Dirichlet distributions, this talk will present the basic ideas and theory to understand and use the Dirichlet process and its close relatives, the Pitman-Yor process and the gamma process. We will first look at some motivating examples. Then we will look at the non-hierarchical versions of the processes, which are basically infinite parameter vectors. These have a number of handy properties and have simple, elegant marginal and posterior inference. Finally, we will look at the hierarchical versions of these processes. These are fundamentally different. To understand the hierarchical version we will briefly review some aspects of stochastic process theory and additive distributions. The hierarchical versions becomes Dirichlet and Gamma distributions (the process part disappears) but the techniques developed for the non-hierarchical process models can be borrowed to develop good algorithms, since the Dirichlet and Gamma are challenging when placed hierarchically.

Slides are here.

And one to the Faculty of Computer Science:

**Learning on networks of distributions for discrete data. **The

HSE announcement is here.

I will motivate the talk by reviewing some state of the art models for problems like matrix factorisation models for link prediction and tweet clustering. Then I will review the classes of distributions that can be strung together in networks to generate discrete data. This allows a rich class of models that, in its simplest form covers things like Poisson matrix factorisation, Latent Dirichlet allocation, and Stochastic block models, but more generally covers complex hierarchical models on network and text data. The distributions covered includes so-called non-parametric distributions such as the Gamma process. Accompanying these are a set of collapsing and augmentation techniques that are used to generate fast Gibbs samplers for many models in this class. To complete this picture, turning complex network models into fast Gibbs samplers, I will illustrate our recent methods of doing matrix factorisation with side information (e.g., GloVe word embeddings), done for link prediction, for instance, for citation networks.