
This is a very small topic in all bunch of Leetcode problems, and I found that even for some problems that are labeled with reservoir sampling, you cannot use this method because it will gives you TLE.
So what problem is reservoir sampling trying to solve? It’s the problem that you want to return a random output from a list.
The basic solution will be using a standard package for each language. Random() from Java for example. You can call:
return list[random.nextInt(list.length)]
to randomly return an element from a array.
This is quite simple, but what if your input is too large to fit in the memory, and you have to read in as a stream? Then you cannot have a full list of elements. That’s where reservoir sampling comes into play:
The probability to choose ith element out of n elements
It’s as simple as 1/n, but how do we refactor so it works with a stream input? Imaging when we have the first i element, then the probability to select i out of ith element is 1/i. When we have i+1th element comes in, we don’t want this element to be chosen, so the probability is i/i+1, to not choose i+1th element.
Basically, for each element coming in, we do a random pick, if it picks the element that is just put in, then we overwrite the answer with the coming element.
Having this process, we can go back to the formula. The probability to pick ith element at the end, is to pick ith element when ith element comes, and don’t pick the element coming afterward:
P(i) = (1/i) * (i/i+1) * (i+1/i+2) … (n-1/n) = 1/n.
Here we go!