What is Weighted Importance Sampling?

Weighted importance sampling is a Monte Carlo technique that modifies the standard importance sampling method by introducing a weight function to reduce variance and improve the accuracy of estimates, particularly when the proposal distribution differs significantly from the target distribution. The key idea is to use a different set of weights, typically of the form W(x) = h(x) g(x), and then use an estimator based on these weights instead of the standard importance sampling estimator. Here, h(x) and g(x) are carefully chosen functions.

Understanding Importance Sampling

Before delving into weighted importance sampling, let's briefly review standard importance sampling. In standard importance sampling, we want to estimate the expected value of a function f(x) with respect to a target distribution p(x):

E_p[f(x)] = ∫ f(x) p(x) dx

If directly sampling from p(x) is difficult, we introduce a proposal distribution q(x) and rewrite the integral as:

E_p[f(x)] = ∫ f(x) (p(x) / q(x)) q(x) dx = E_q[f(x) w(x)]

where w(x) = p(x) / q(x) is the importance weight. We then sample from q(x) and estimate the expectation using:

E_p[f(x)] ≈ (1/N) Σ_i=1^N f(x_i) w(x_i)

where x_i are samples drawn from q(x).

The Essence of Weighted Importance Sampling

Weighted importance sampling builds upon this foundation but introduces an additional weight function. Instead of simply using w(x) = p(x)/q(x), we introduce a new weight function, often denoted as W(x) = h(x) * g(x). The goal is to choose h(x) and g(x) in a way that further reduces variance. This leads to an estimator of the form:

E_p[f(x)] ≈ Σ_i=1^N f(x_i) W(x_i) / Σ_i=1^N W(x_i)

where x_i are samples drawn from q(x).

Why Introduce Additional Weights?

The rationale behind introducing additional weights is to improve the estimator's efficiency and stability. This can be achieved by:

Variance Reduction: Choosing appropriate h(x) and g(x) can significantly reduce the variance of the estimator compared to standard importance sampling.
Handling Tail Behavior: In cases where the tails of p(x) and q(x) differ significantly, standard importance sampling can suffer from high variance. Weighted importance sampling can mitigate this issue by down-weighting samples from the tails of q(x) that contribute disproportionately to the variance.
Robustness: Weighted importance sampling can be more robust to misspecification of the proposal distribution q(x).

Choosing the Weight Functions

The selection of h(x) and g(x) is crucial for the success of weighted importance sampling. There is no one-size-fits-all approach, and the optimal choice depends on the specific problem and the characteristics of p(x) and q(x). Some common strategies include:

Control Variates: Using control variates can help reduce variance by exploiting known information about the function f(x) or the target distribution p(x).
Stratified Sampling: Dividing the sample space into strata and sampling proportionally from each stratum can improve the accuracy of the estimator.
Adaptive Importance Sampling: Adjusting the proposal distribution q(x) iteratively based on previous samples can lead to a more efficient sampling strategy.

Example

Suppose we want to estimate the integral ∫₀¹ x² dx using importance sampling. The true value is 1/3. Let's consider a simple example of how weighted importance sampling might be applied (though it might not be the best choice for this simple case):

Target Distribution: In this case, we are approximating a definite integral, so we can think of p(x) as a uniform distribution over [0, 1], i.e., p(x) = 1 for 0 ≤ x ≤ 1, and 0 otherwise. We're estimating E[x²] under this distribution. So f(x) = x².
Proposal Distribution: Let's choose q(x) to be a Beta distribution with parameters α = 2 and β = 2, shifted to the interval [0, 1]. This means q(x) = 6x(1-x) for 0 ≤ x ≤ 1.
Standard Importance Weights: w(x) = p(x) / q(x) = 1 / (6x(1-x)) = 1 / (6x - 6x²) for 0 < x < 1.
Weighted Importance Weights: Let's choose h(x) = x and g(x) = (1-x). This gives W(x) = x(1-x). Our combined weight is W(x) = h(x)g(x) = x(1-x). If we further decided to combine that with the importance weight to get even better control, we could do W(x) = (p(x)/q(x))*h(x)g(x).
Estimator: We would then draw samples x_i from the Beta distribution q(x) and compute the weighted importance sampling estimator:

E[x²] ≈ Σ_i=1^N x_i² W(x_i) / Σ_i=1^N W(x_i)

In this simplified illustration, judicious selection of h(x) and g(x) can potentially refine the approximation by reducing the variance associated with the samples drawn from q(x). The effectiveness hinges on the interplay between f(x), p(x), q(x), h(x) and g(x).

Conclusion

Weighted importance sampling is a powerful extension of standard importance sampling that provides greater flexibility and potential for variance reduction. By introducing carefully chosen weight functions, it enables more accurate and robust estimation of expectations, particularly in challenging scenarios where the target and proposal distributions differ significantly. Choosing the right weight functions, h(x) and g(x), is key to the method's success.

askvity