Normalised Importance Sampling is a modification of the standard Importance Sampling technique that addresses the issue of the unknown normalising constant in the target distribution. In essence, it estimates expectations by weighting samples from a proposal distribution, but with weights that are scaled to sum to one.
Here's a breakdown:
-
Importance Sampling (IS) Recap: IS approximates the expectation of a function
f(x)
with respect to a target distributionp(x)
by drawing samples from a proposal distributionq(x)
and weighting each sample by the importance weightw(x) = p(x) / q(x)
. The core idea is to useq(x)
when directly sampling fromp(x)
is difficult or impossible. -
The Problem with Unknown Normalising Constants: Often, we only know the target distribution
p(x)
up to a normalising constant. In other words, we have access to an unnormalised version,p*(x)
, wherep(x) = p*(x) / Z
, andZ
is the normalising constant. This constant is usually intractable to compute directly. -
Normalised Importance Sampling Solution: Normalised IS tackles this issue by estimating the normalising constant alongside the expectation. Instead of using the raw importance weights
w(x) = p(x) / q(x)
, which require knowingp(x)
, it uses unnormalised weightsw*(x) = p*(x) / q(x)
. The normalising constantZ
is then estimated as the average of these unnormalised weights.-
Expectation Estimate: The expectation of
f(x)
is then approximated as:E[f(x)] ≈ Σ [w*(x_i) * f(x_i)] / Σ [w*(x_i)]
where
x_i
are samples drawn fromq(x)
, and the summations are over allN
samples.
-
-
Key Advantages of Normalised Importance Sampling:
- Handles unnormalised target distributions: It's applicable even when the normalising constant of the target distribution is unknown or intractable.
- Generally lower variance: In many cases, normalised IS can exhibit lower variance compared to standard IS, especially when the proposal distribution
q(x)
is a good approximation of the target distributionp(x)
.
-
Example:
Imagine we want to estimate the average height of people in a city, but we only have access to a biased sample (e.g., people who visit a specific website). We can use Normalised Importance Sampling:
p*(x)
: Represents the unnormalised distribution of heights in the entire city (which we don't know precisely).q(x)
: Represents the distribution of heights in our biased sample (the people who visit the website).- We weigh each person's height in the biased sample by
w*(x) = p*(x) / q(x)
, wherep*(x)
is an estimate of the relative probability of that height in the city (even without knowing the exact height distribution), andq(x)
is the probability of observing that height in our biased sample. - We normalise these weights to sum to 1.
- The weighted average height becomes our estimate of the average height in the entire city.
-
Potential Issues:
- Sensitivity to Tail Behavior: Normalised IS can be sensitive if the tails of
q(x)
are much lighter than the tails ofp(x)
. In such cases, a few large weights can dominate the estimate, leading to high variance. - *Accurate p(x) needed:* While it avoids calculating the absolute normalization constant, Normalised IS still requires a good estimate of the relative probabilities within `p(x)`.
- Sensitivity to Tail Behavior: Normalised IS can be sensitive if the tails of
In summary, Normalised Importance Sampling is a powerful technique for approximating expectations when the target distribution is only known up to a normalising constant, by using weighted samples from a proposal distribution with scaled weights.