Archive for January, 2015

h1

Talk given around Europe, Jan 2015

January 30, 2015

This talk, PDF slide here, is titled “Non-parametric Methods for Unsupervised Semantic Modelling” and is really a two-hour talk derived from the HKUST talk in December.  I updated it continuously during January and gave it at Helsinki, IJS, UCL, Cambridge and Oxford. It contains my simplified version of Lancelot James‘ excellent theory of generalised Indian Buffet Processes,“Poisson Latent Feature Calculus for Generalized Indian Buffet Processes,” which is on Arxiv 2014.  My version explains things differently with heuristics. The talk I gave at JSI (Jozef Stefan Institute in Ljublana) on 14th Jan 2015 was recorded.  The group here, with Dunja Mladenić and Marko Grobelnik, are expert in areas like Data Science and Text Mining, but they’re not into Bayesian non-parametrics, so in this version of the talk I mostly avoided the statistical details and talked more about what we did and why.  The talk is up on Video Lectures.

Abstract: This talk will cover some of our recent work in extended topic models to serve as tools in text mining and NLP (and hopefully, later, in IR) when some semantic analysis is required. In some sense our goals are akin to the use of Latent Semantic Analysis. The basic theoretical/algorithmic tool we have for this is non-parametric Bayesian methods for reasoning on hierarchies of probability vectors. The concepts will be introduced but not the statistical detail. Then I’ll present some of our KDD 2014 paper (Experiments with Non-parametric Topic Models) that is currently the best performing topic model by a number of metrics.

h1

Latent IBP compound Dirichlet Allocation

January 30, 2015

I visited Ralf Herbrich’s new Amazon offices in Berlin on 8th January and chatted there on topic modelling.  A pleasant result for me was meeting both Cédric Archambeau and Jan Gasthaus, both having been active in my area.

With Cedric I discussed Latent IBP compound Dirichlet Allocation (in the long awaited IEEE Trans PAMI 2015 special issue on Bayesian non-parametics, PDF on Cédric’s webpage) model which combines a 3-parameter Indian Buffet Process with Dirichlets.  This was joint work with Balaji Lakshminarayanan and Guillaume Bouchard.

Their work improves on the earlier paper by Williamson, Wang, Heller, and Blei (2010) on the Focused Topic Model which struggled with using the IBP theory.  I’d originally ignored Williamson et al.s’ work because they only tested on toy data sets, despite it being such a great model.  The marginal posterior for the document-topic indicator matrix is given in Archambeau et al.s’ Equation (43) which they attribute to Teh and Görür (NIPS 2009) but its easily derived using Dirichlet marginals and Lancelot James’ general formulas for IBPs.  This is the so-called IBP compound Dirichlet. From there, its easy to derive a collapsed Gibbs sampler mirroring regular LDA.  Theory and sampling hyper-parameters for the 3-parameter IBP I describe in my Helsinki talk and coming tutorial.

This is a great paper with quality empirical work and the best results I’ve seen for non-parametric LDA (other than ours).  Their implementation isn’t performance tuned so their timing figures are not that indicative, but they ran on non-trivial data sets so its good enough.

Note for the curious, I ran our HCA code to duplicate their experimental results on the two larger data sets.  Details in the Helsinki talk.  Basically:

  • They compared against prior HDP-LDA implementations so of course beat them substantially.
  • Our version of HDP-LDA (without burstiness) works as well as their LIDA algorithm, their better one with IBPs on the topic and the word side, and is substantially faster.
  • Our fully non-parametric LDA (DPs for documents, PYPs for words, no burstiness) beat their LIDA substantially.

So while we beat them with our superior collapsed Gibbs sampling, their results are impressive so I’m excited by the possibility of trying their methods.

h1

Aalto tutorial 19th January 2015

January 19, 2015

Slides are here.  Are a bit preliminary!  This is gradually being reworked for MLSS 2015 in Sydney, February.

I’ve added a section on the generalised Indian Buffet Process theory from Lancelot James, my own “simplified” version for computer scientists.

h1

Non-reversible operators for MCMC

January 15, 2015

Been visiting University of Helsinki since Christmas and there is Jukka Corander, a Bayesian statistician who works on variants of MCMC and pseudo-likelihood, ways of scaling up statistical computation.  He showed me his 2006 Statistics and Computing paper on “Bayesian model learning based on a parallel MCMC strategy,” (PDF around if you search) and I have to say I’m amazed.  This is so important, as anyone who tries MCMC in complex spaces would know.  The reason for wanting these is:

  • proposal operators for things like split-merge must propose *reasonable* alternatives and therefore this must be done with a non-trivial operator
    • e.g.,  greedy search is used to build an initial split for the proposal
  • developing the reverse operator for these is very hard

So Jukka’s groups result is that reversible MCMC is not necessary.  As long as the usual Metropolis-Hastings acceptance condition applies, the MCMC process converges in the long term.

Anyway, I can now build split-merge operators using MCMC without requiring crazy reversability!