I saw the NIPS 2013 paper by Miller and Harrison on “A simple example of Dirichlet process mixture inconsistency for the number of components,” and I had some issues with it. A Dirichlet Process is a prior that says there is an infinite number of clusters in the mixture. But at any one time, after seeing N data and a concentration parameter of θ, it expects to see about λ = θ log(N/θ) clusters plus or minus 3*sqrt(λ) or so … for N>θ>>0. This approximation gives the famous “grows with log(N)” formula some tutorials give for DPs. Anyway, so I cannot really see why this makes the DP *inconsistent* if the true model has a finite number of clusters, which is not in the prior! It just means the DP is true to itself. So this apparent inconsistency does not affect a Bayesian.

This seems to be a basic confusion with the Dirichlet Process generally. Some people think it can be used to estimate the “right number of clusters”. Well, be careful. I can change θ and get it to estimate a large or a small number of clusters! We do the same with the number of topics in a non-parametric topic model.