Latent IBP compound Dirichlet Allocation

January 30, 2015

I visited Ralf Herbrich’s new Amazon offices in Berlin on 8th January and chatted there on topic modelling.  A pleasant result for me was meeting both Cédric Archambeau and Jan Gasthaus, both having been active in my area.

With Cedric I discussed Latent IBP compound Dirichlet Allocation (in the long awaited IEEE Trans PAMI 2015 special issue on Bayesian non-parametics, PDF on Cédric’s webpage) model which combines a 3-parameter Indian Buffet Process with Dirichlets.  This was joint work with Balaji Lakshminarayanan and Guillaume Bouchard.

Their work improves on the earlier paper by Williamson, Wang, Heller, and Blei (2010) on the Focused Topic Model which struggled with using the IBP theory.  I’d originally ignored Williamson et al.s’ work because they only tested on toy data sets, despite it being such a great model.  The marginal posterior for the document-topic indicator matrix is given in Archambeau et al.s’ Equation (43) which they attribute to Teh and Görür (NIPS 2009) but its easily derived using Dirichlet marginals and Lancelot James’ general formulas for IBPs.  This is the so-called IBP compound Dirichlet. From there, its easy to derive a collapsed Gibbs sampler mirroring regular LDA.  Theory and sampling hyper-parameters for the 3-parameter IBP I describe in my Helsinki talk and coming tutorial.

This is a great paper with quality empirical work and the best results I’ve seen for non-parametric LDA (other than ours).  Their implementation isn’t performance tuned so their timing figures are not that indicative, but they ran on non-trivial data sets so its good enough.

Note for the curious, I ran our HCA code to duplicate their experimental results on the two larger data sets.  Details in the Helsinki talk.  Basically:

  • They compared against prior HDP-LDA implementations so of course beat them substantially.
  • Our version of HDP-LDA (without burstiness) works as well as their LIDA algorithm, their better one with IBPs on the topic and the word side, and is substantially faster.
  • Our fully non-parametric LDA (DPs for documents, PYPs for words, no burstiness) beat their LIDA substantially.

So while we beat them with our superior collapsed Gibbs sampling, their results are impressive so I’m excited by the possibility of trying their methods.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: