## Lancelot James on the history of the Pitman-Yor Process

November 10, 2014Those of us in Machine Learning (read Computer Science) departments rely on the work of many statisticians. While Jim Pitman (UCB) and colleagues has been instrumental in developing the fundamental theory, other statisticians such as Lancelot James (HKUST) have been extending the work and interpreting it so the rest of us can understand it. Marcus Hutter (ANU) developed a bunch of formula, and I remember him being shattered to discover his toughest “exact formula for the second order Stirling number” was published by Toscano in 1939, and Lancelot cites Halmos 1944 on stick-breaking!

*So its important to know our history. *Here is Lancelot’s short history of the Pitman-Yor Process, he wrote during correspondence with us, and my students and I are most grateful for his many insights (some edits by me):

- The process arises from a comprehensive study of excursion lengths on Markov processes, where an excursion length is the time from one zero-crossing to the next:
- Excursions can either be positive or negative. So the (P k) represent the length of excursions of a process on [0,1]. The sign of the excursion is not reflected in the law of (P k). In fact it is independent of the length.
- A class of Bessel Processes indexed by a parameter 0 ≤ α < 1 in a series of works of Jim Pitman and Marc Yor and co-authors during the 1980’s and 1990’s, generalizing results for Brownian motion (which is the α = 1/2 case).
- The study of (P k) following a two parameter Poisson-Dirichlet distribution with parameters (α, θ) denoted as PD(α, θ). PD(1/2, 0), corresponds to Brownian Motion and PD(1/2, 1/2), corresponds to Brownian Bridge.

- Its stick-breaking representation is formally derived in Perman, Pitman and Yor (1992):
- Mihael Perman, Jim Pitman, and Marc Yor, ”Size-biased sampling of Poisson point processes and excursions,”
*Probability Theory and Related Fields*, 92(1), 1992.

- Mihael Perman, Jim Pitman, and Marc Yor, ”Size-biased sampling of Poisson point processes and excursions,”
- This process, but not Bayesian aspects of it, is later discussed by Pitman and Yor:
- Pitman, Jim, and Marc Yor. ”The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator.”
*The Annals of Probability*(1997): 855-900. (1997).

- Pitman, Jim, and Marc Yor. ”The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator.”
- The corresponding random distribution function is treated in a formal Bayesian setting by Pitman:
- Pitman, Jim. ”Some developments of the Blackwell-MacQueen urn scheme.”
*Statistics, proba**bility and game theory.*Institute of Mathematical Statistics, 1996. 245-267.

- Pitman, Jim. ”Some developments of the Blackwell-MacQueen urn scheme.”
- See also:
- Pitman, J.
*Combinatorial stochastic processes. Lecture Notes in Mathematics Vol. 1875*, x+256, Springer-Verlag, Berlin (2006). Lectures from the 32nd Summer School on Probability Theory held in Saint-Flour, July 7–24, 2002, With a foreword by Jean Picard. - Pitman, Jim (2003) “Poisson-Kingman partitions” in
*Science and Statistics: A Festschrift for**Terry Speed*, D. R. Goldstein editor, Lecture Notes – Monograph Series Vol. 30, 1-34, Institute of Mathematical Statistics, Beachwood, OH (2003).

- Pitman, J.
- Ishwaran and James showed how such processes could be used practically in complex Bayesian mixture models and also were the ones who first called these processes Pitman-Yor processes:
- Ishwaran, Hemant, and Lancelot F. James, ”Gibbs sampling methods for stick-breaking priors,”
*Journal of the American Statistical Association*96.453 (2001). - Ishwaran, Hemant, and Lancelot F. James, “Generalized weighted Chinese restaurant processes for species sampling mixture models,”
*Statistica Sinica*13.4 (2003): 1211-1236.

- Ishwaran, Hemant, and Lancelot F. James, ”Gibbs sampling methods for stick-breaking priors,”
- Arguably, it is their 2001 paper that helped to popularize the wide usage of stick-breaking representations and also of course the Pitman-Yor process itself within the BNP and the ML communities. Goldwater, Griffiths and Johnson were the first to point out that the Pitman-Yor process proved to be superior to the Dirichlet Process for language applications involving certain types of power law behavior.
- Goldwater, Sharon, Tom Griffiths, and Mark Johnson. ”Interpolating between types and tokens by estimating power-law generators.”
*Advances in Neural Information Processing Systems*18 (2006): 459.

This makes the PY process an important model in ML/NLP applications.

- Goldwater, Sharon, Tom Griffiths, and Mark Johnson. ”Interpolating between types and tokens by estimating power-law generators.”
- Arratia, Barbour and Tavar ́e in contrast one would note the DP is a much better process to use if one were trying to model logarithmic behavior as illustrated in
- Arratia, R., A. D. Barbour, and S. Tavar ́e,
*Logarithmic Combinatorial Structures: A Probabilistic**Approach*, 2003, EMS Monographs in Mathematics 1.

- Arratia, R., A. D. Barbour, and S. Tavar ́e,
- So this points out that not all processes are appropriate in all cases.

[…] on from Lancelot James’ post on the Pitman-Yor process (PYP) I thought I’d follow up with key events from the machine learning […]

by Wray on the history of the hierarchical Pitman-Yor process | Topical Issues November 15, 2014 at 7:21 am