A Tweedie Trilogy - Part I: Frequency and Aggregration Invariance

This article was first published on Python – Michael's and Christian's Blog , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

TLDR: In this first part of the Tweedie Trilogy, we will take a look at what happens to a GLM if we aggregate the data by a group-by operation. A frequency model for insurance pricing will serve as an example.

This trilogy celebrates the 40th birthday of Tweedie distributions in 2024 and highlights some of their very special properties.

Intro

Tweedie distributions and Generalised Linear Models (GLM) have an intertwined relationship. While GLMs are, in my view, one of the best reference models for estimating expectations, Tweedie distributions lie at the heart of expectation estimation. In fact, basically all applied GLMs in practice use Tweedie distributions with three notable exceptions: the binomial, the multinomial and the negative binomial distribution.

Mean-Variance Relation

“An index which distinguishes between some important exponential families” is the original publication title of Maurice Charles Kenneth Tweedie in 1984. This index is meanwhile called the Tweedie power parameter p. Recall that distributions of the exponential dispersion family always fulfil a mean-variance relationship. Its even a way to define them. For the Tweedie distribution, denoted Tw_p(\mu, \phi), the relation reads

\begin{align*}
\operatorname{E}[Y] &= \mu
\\
\operatorname{Var}[Y] &= \phi \mu^p
\end{align*}

with dispersion parameter \phi. Some very common members are given in the following table.

pdistributiondomain Ydomain \mu
0Normal / Gaussian\mathrm{R}\mathrm{R}
1Poisson0, 1, 2, \ldots\mathrm{R}_+
(1,2)Compound Poisson-Gamma\mathrm{R}_+ \cup \{0\}\mathrm{R}_+
2Gamma\mathrm{R}_+\mathrm{R}_+
3inverse Gaussian\mathrm{R}_+\mathrm{R}_+

Insurance Pricing Models

In non-life insurance pricing, most claims happen somewhat randomly, typically the occurrence as well as the size. Take the theft of your bike or a water damage of your basement due to flooding as an example. Pricing actuaries usually want to predict the expected loss E[Y|X] given some features X of a policy. The set of features could contain the purchasing price of your bike or the proximity of your house to a river.

Instead of directly modelling the expected loss per exposure w, e.g. the time duration of the insurance contract, the most used approach is the famous frequency-severity split:

\begin{align}
\operatorname{E}\left[\frac{Y}{w}\right] = \underbrace{\operatorname{E}\left[\frac{N}{w}\right]}_{frequency} \cdot
\underbrace{\operatorname{E}\left[\left. \frac{Y}{n}\right| N=n\right]}_{severity}
\end{align}

For simplicity, the conditioning on X is suppressed, it would occur in every expectation. The first part \operatorname{E}\left[\frac{N}{w}\right]is the (expected) frequency, i.e. the number of claims per exposure (time). The second term \operatorname{E}\left[\left.\frac{Y}{N}\right| N\right] is the (expected) severity, i.e. the average claim size (per claim) given a fixed number of claims. Here, we focus on the frequency part.

Convolution and Aggregation Invariance

This property might first seem very theoretical, but it may be one of the most important properties for the estimation of expectations E[Y|X] with GLMs. It is in fact a property valid for the whole exponential dispersion family: The weighted mean of i.i.d. random variables has (almost) the same distribution!

If

\begin{align*}
Y_i &\overset{i.i.d}{\sim} \mathrm{Tw}_p(\mu, \phi/w_i) \,,
\\
w_+ &= \sum_i w_i \quad\text{with } w_i >0 \,,
\end{align*}

then

\begin{align*}
Y &=\sum_i^n \frac{w_i Y_i}{w_+} \sim \mathrm{Tw}_p(\mu, \phi/w_+) \,.
\end{align*}

It is obvious that the mean of Yis again \mu. But is is remarkable that it has the same distribution with the same power parameter, only the 2nd argument with the dispersion parameter differs. But the dispersion parameter cancels out in GLM estimations. In fact, we will show that two GLMs, one on aggregated data, give identical results.

This is a quite essential property for data aggregation. It means that one can aggregate rows with identical features and still do an analysis without loss of information.

Poisson Distribution

When modelling counts, the Poisson distribution is by far the easiest distribution one can think of. It only has a single parameter, is a member of the Tweedie family, and fulfils the mean-variance relation

\begin{equation*}
\operatorname{E}[N] = \mu = \operatorname{Var}[N] \,.
\end{equation*}

In particular, p=1. While the distribution is strictly speaking only for counts, i.e. N takes on non-negative integer values, Poisson regression also works for any non-negative response variable like N/w \in \mathrm{R}.

Frequency Example

For demonstration, we fit a Poisson GLM on the french motor third-party liability claims dataset, cf. the corresponding scikit-learn example and the case study 1 of the Swiss Association of Actuaries on the same dataset.

from glum import GeneralizedLinearRegressor
import pandas as pd

# ... quite some code ... here we abbreviate.
y_freq = df["ClaimNb"] / df["Exposure"]
w_freq = df["Exposure"]
X = df[x_vars]
glm_params = {
    "alpha": 0,
    "drop_first": True,
    "gradient_tol": 1e-8,
}
glm_freq = GeneralizedLinearRegressor(
    family="poisson", **glm_params
).fit(X, y_freq, sample_weight=w_freq)
print(
  f"Total predicted number of claims = "
  f"{(w_freq * glm_freq.predict(X)).sum():_.2f}"
)
# Total predicted number of claims = 26_444.00

# Now aggregated
df_agg = df.groupby(x_vars, observed=True).sum().reset_index()
print(
    f"Aggregation reduced number of rows from {df.shape[0]:_}"
    f"to {df_agg.shape[0]:_}."
)
# Aggregation reduced number of rows from 678_013 to 133_413.
y_agg_freq = df_agg["ClaimNb"] / df_agg["Exposure"]
w_agg_freq = df_agg["Exposure"]
X_agg = df_agg[x_vars]
glm_agg_freq = GeneralizedLinearRegressor(
    family="poisson", **glm_params
).fit(X_agg, y_agg_freq, sample_weight=w_agg_freq)
print(
    f"Total predicted number of claims = "
    f"{(w_agg_freq * glm_agg_freq.predict(X_agg)).sum():_.2f}"
)
# Total predicted number of claims = 26_444.00

In fact, both models have the same intercept term and same coefficients, they are really identical models (up to numerical precision):

print(
    f"intercept freq{'':<18}= {glm_freq.intercept_}\n"
    f"intercept freq aggregated model = {glm_agg_freq.intercept_}"
)
# intercept freq                  = -3.7564376764216747
# intercept freq aggregated model = -3.7564376764216747

np.max(np.abs(glm_freq.coef_ - glm_agg_freq.coef_)) < 1e-13
# True

Outlook

The full notebook can be found here.

In the next week, part ii of this trilogy will follow. There, we will meet some more of its quite remarkable properties.

Further references:

  • Tweedie M.C.K. 1984. “An index which distinguishes between some important exponential families”. Statistics: Applications and New Directions. Proceedings of the Indian Statistical Institute Golden Jubilee International Conference, Indian Statistical Institute, Cal- cutta, pp. 579–604.
To leave a comment for the author, please follow the link and comment on their blog: Python – Michael's and Christian's Blog .

Want to share your content on python-bloggers? click here.