Kar Wai Lim has just been told they “confirmed the approval” of his PhD (though it hasn’t been “conferred” yet, so he’s not officially a Dr., yet) and he spent the time post submission pumping out journal and conference papers. Ahhh, the unencumbered life of the fresh PhD!

This one:

“Nonparametric Bayesian topic modelling with the hierarchical Pitman–Yor processes”, Kar Wai Lim , Wray Buntine, Changyou Chen, Lan Du, *International Journal of Approximate Reasoning*, **78** (2016) 172–191.

includes what I believe is the *world’s best tweet clusterer*. Certainly blows away the state of the art tweet pooling methods. Main issue is that the current implementation only scales to a million or so tweets, and not the 100 million or expected in some communities. Easily addressed with a bit of coding work.

We did this to demonstrate the rich possibilities in terms of semantic hierarchies one has, largely unexplored, using simple Gibbs sampling with Pitman-Yor processes. Lan Du (Monash) started this branch of research. I challenge anyone to do this particular model with variational algorithms😉 The machine learning community in the last decade unfortunately got lost on the complexities of Chinese restaurant processes and stick-breaking representations for which complex semantic hierarchies are, well, a bit of a headache!