In this post I’d like to briefly describe one of my favourite algorithmic techniques: inverse transform sampling. Despite its scary-sounding name, it is actually quite a simple and very useful procedure for generating random numbers from an arbitrary, known probability distribution — given random numbers drawn from a uniform distribution. For example, if you had empirically observed from a database that a variable took on some probability distribution, and you wanted to simulate similar conditions, you would need to draw a random variable with that same distribution. How would you go about doing this?
In essence, it is simply a two-step process:
- Generate a random value x from the uniform distribution between 0 and 1.
- Return the value y such that x = CDF(y), where CDF is the cumulative distribution function of the probability distribution you wish to achieve.
Why does this work? Step 1 picks a uniformly random value between 0 and 1, so you can interpret this as a probability. Step 2 inverts the desired cumulative distribution function; you are calculating y = CDF-1(x), and therefore the returned value y is such that a random variable drawn from that distribution is less than or equal to y with probability x.
Thinking in terms of the original probability density function, we are uniformly randomly choosing a proportion of the area under the curve of the PDF and returning the number in the domain such that exactly this proportion of the area occurs to the left of that number. So numbers in the regions of the PDF with greater areas are more likely to occur. The uniform distribution is thereby projected onto this desired PDF.
This is a really neat algorithm. But what do you do if you don’t know the CDF of the distribution you want to sample from? I discuss a solution in Part 2 of this series: How To Generate Any Probability Distribution, Part 2: The Metropolis-Hastings Algorithm