Reimagining Equity Solvency Capital Requirement Approximation (one of my Master’s Thesis subjects): From Bilinear Interpolation to Probabilistic Machine Learning
Want to share your content on python-bloggers? click here.
Reimagining Equity Solvency Capital Requirement Approximation (one of my Master’s Thesis subjects): From Bilinear Interpolation to Probabilistic Machine Learning
In the world of insurance and financial risk management, calculating the Solvency Capital Requirement (SCR) for equity risk could be a computationally intensive task that can make or break real-time decision making. Traditional approaches rely on expensive Monte Carlo simulations that can take hours to complete, forcing practitioners to develop approximation schemes. Developing an approximation scheme was a project I tackled back in 2007-2009 for my Master’s Thesis in Actuarial Science (see references below).
What I did back then
- 96 expensive ALIM simulations were run across four key variables:
- Minimum guaranteed rate (tmg): 1.75% to 6%
- Percentage of investments in stocks: 2% to 6.25%
- Latent capital gains on equities: 2% to 6.25%
- Profit sharing provisions (ppe): 3.5 to 10
Multi-stage interpolation strategy: I decomposed the problem into multiple 2D approximation grids, then combined cross-sections to reconstruct the full 4D surface.
- Validation through error analysis: Rigorous comparison between simulation results and approximations to ensure the method’s reliability.
A Modern Probabilistic Approach
Today, I revisit this same challenge through the lens of probabilistic machine learning, and obtain functional expressions/approximations in R and Python. Fascinating how easy it may look now!
This probabilistic approach offers several advantages:
- Built-in uncertainty quantification: Know not just the prediction, but how confident we should be
- Automatic feature learning: Let the model discover optimal representations
- Fast
Of course, having a functional probabilistic machine learning model, we can think of many ways to stress test (i.e obtain what-if analyses) these results, based on changes in one (or more) of the explanatory variables
References:
- Moudiki, T. (2012). Modélisation du SCR Equity. Institut des Actuaires. PDF
- ResearchGate version: https://www.researchgate.net/publication/395528539_memoire_moudiki_2012
R version
(scr_equity <- read.csv("ALIM4D.txt"))
| tmg | pct_actions | pvl_actions | ppe | SRC_Equity |
|---|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <dbl> | <dbl> |
| 1.75 | 2.00 | 2.00 | 3.50 | 56471378 |
| 1.75 | 2.00 | 2.00 | 9.00 | 48531931 |
| 1.75 | 2.00 | 2.00 | 9.50 | 48558178 |
| 1.75 | 2.00 | 2.00 | 9.75 | 48570523 |
| 5.00 | 2.00 | 2.00 | 3.50 | 65111083 |
| 5.00 | 2.00 | 2.00 | 9.00 | 54433115 |
| 5.00 | 2.00 | 2.00 | 9.50 | 54436348 |
| 5.00 | 2.00 | 2.00 | 9.75 | 54526734 |
| 5.25 | 2.00 | 2.00 | 3.50 | 65244870 |
| 5.25 | 2.00 | 2.00 | 9.00 | 54325632 |
| 5.25 | 2.00 | 2.00 | 9.50 | 54387565 |
| 5.25 | 2.00 | 2.00 | 9.75 | 54418533 |
| 5.50 | 2.00 | 2.00 | 3.50 | 65396012 |
| 5.50 | 2.00 | 2.00 | 9.00 | 54239282 |
| 5.50 | 2.00 | 2.00 | 9.50 | 54302132 |
| 5.50 | 2.00 | 2.00 | 9.75 | 54333018 |
| 5.75 | 2.00 | 2.00 | 3.50 | 65581289 |
| 5.75 | 2.00 | 2.00 | 9.00 | 54168174 |
| 5.75 | 2.00 | 2.00 | 9.50 | 54209587 |
| 5.75 | 2.00 | 2.00 | 9.75 | 54210481 |
| 6.00 | 2.00 | 2.00 | 3.50 | 65785420 |
| 6.00 | 2.00 | 2.00 | 9.00 | 54042241 |
| 6.00 | 2.00 | 2.00 | 9.50 | 54103639 |
| 6.00 | 2.00 | 2.00 | 9.75 | 54134241 |
| 1.75 | 2.75 | 2.75 | 9.00 | 48435808 |
| 1.75 | 2.75 | 2.75 | 9.25 | 48446558 |
| 1.75 | 2.75 | 2.75 | 9.50 | 48459074 |
| 1.75 | 2.75 | 2.75 | 9.75 | 48473874 |
| 5.00 | 2.75 | 2.75 | 9.00 | 54501129 |
| 5.00 | 2.75 | 2.75 | 9.25 | 54531852 |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 5.75 | 6.00 | 6.00 | 9.50 | 53901737 |
| 5.75 | 6.00 | 6.00 | 9.75 | 53968463 |
| 6.00 | 6.00 | 6.00 | 3.50 | 62378886 |
| 6.00 | 6.00 | 6.00 | 9.25 | 53780562 |
| 6.00 | 6.00 | 6.00 | 9.50 | 53730182 |
| 6.00 | 6.00 | 6.00 | 9.75 | 53950814 |
| 1.75 | 6.25 | 6.25 | 3.50 | 51709654 |
| 1.75 | 6.25 | 6.25 | 9.25 | 47537722 |
| 1.75 | 6.25 | 6.25 | 9.50 | 47543381 |
| 1.75 | 6.25 | 6.25 | 9.75 | 47555017 |
| 5.00 | 6.25 | 6.25 | 3.50 | 61268505 |
| 5.00 | 6.25 | 6.25 | 9.25 | 54207189 |
| 5.00 | 6.25 | 6.25 | 9.50 | 54234608 |
| 5.00 | 6.25 | 6.25 | 9.75 | 54268573 |
| 5.25 | 6.25 | 6.25 | 3.50 | 61467297 |
| 5.25 | 6.25 | 6.25 | 9.25 | 54070788 |
| 5.25 | 6.25 | 6.25 | 9.50 | 54096598 |
| 5.25 | 6.25 | 6.25 | 9.75 | 54154003 |
| 5.50 | 6.25 | 6.25 | 3.50 | 61671008 |
| 5.50 | 6.25 | 6.25 | 9.25 | 53964700 |
| 5.50 | 6.25 | 6.25 | 9.50 | 53994107 |
| 5.50 | 6.25 | 6.25 | 9.75 | 54052459 |
| 5.75 | 6.25 | 6.25 | 3.50 | 61868864 |
| 5.75 | 6.25 | 6.25 | 9.25 | 53862132 |
| 5.75 | 6.25 | 6.25 | 9.50 | 53881640 |
| 5.75 | 6.25 | 6.25 | 9.75 | 53941964 |
| 6.00 | 6.25 | 6.25 | 3.50 | 62112237 |
| 6.00 | 6.25 | 6.25 | 9.25 | 53734035 |
| 6.00 | 6.25 | 6.25 | 9.50 | 53764618 |
| 6.00 | 6.25 | 6.25 | 9.75 | 53825660 |
scr_equity$SRC_Equity <- scr_equity$SRC_Equity/1e6
options(repos = c(techtonique = "https://r-packages.techtonique.net",
CRAN = "https://cloud.r-project.org"))
install.packages(c("rvfl", "learningmachine"))
set.seed(13) train_idx <- sample(nrow(scr_equity), 0.8 * nrow(scr_equity)) X_train <- as.matrix(scr_equity[train_idx, -ncol(scr_equity)]) X_test <- as.matrix(scr_equity[-train_idx, -ncol(scr_equity)]) y_train <- scr_equity$SRC_Equity[train_idx] y_test <- scr_equity$SRC_Equity[-train_idx]
obj <- learningmachine::Regressor$new(method = "krr", pi_method = "none")
obj$get_type()
t0 <- proc.time()[3]
obj$fit(X_train, y_train, reg_lambda = 0.1)
cat("Elapsed: ", proc.time()[3] - t0, "s \n")
‘regression’
Elapsed: 0.005 s
print(sqrt(mean((obj$predict(X_test) - y_test)^2)))
[1] 0.7250047
obj$summary(X_test, y=y_test, show_progress=TRUE)
|======================================================================| 100%
$R_squared
[1] 0.9306298
$R_squared_adj
[1] 0.9121311
$Residuals
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.097222 -0.590318 -0.051308 -0.006375 0.447859 1.660139
$citests
estimate lower upper p-value signif
tmg 0.8311760 -0.8484270 2.51077903 3.133161e-01
pct_actions -0.4845265 -0.9327082 -0.03634475 3.555821e-02 *
pvl_actions -0.4845265 -0.9327082 -0.03634475 3.555821e-02 *
ppe -2.2492137 -2.4397536 -2.05867385 6.622214e-16 ***
$signif_codes
[1] "Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1"
$effects
── Data Summary ────────────────────────
Values
Name effects
Number of rows 20
Number of columns 4
_______________________
Column type frequency:
numeric 4
________________________
Group variables None
── Variable type: numeric ──────────────────────────────────────────────────────
skim_variable mean sd p0 p25 p50 p75 p100 hist
obj <- learningmachine::Regressor$new(method = "rvfl",
nb_hidden = 3L,
pi_method = "kdesplitconformal")
t0 <- proc.time()[3]
obj$fit(X_train, y_train, reg_lambda = 0.01)
cat("Elapsed: ", proc.time()[3] - t0, "s \n")
Elapsed: 0.006 s
obj$summary(X_test, y=y_test, show_progress=FALSE)
$R_squared
[1] 0.8556358
$R_squared_adj
[1] 0.8171387
$Residuals
Min. 1st Qu. Median Mean 3rd Qu. Max.
-2.1720 -1.2977 -0.8132 -0.8003 -0.3254 0.5877
$Coverage_rate
[1] 100
$citests
estimate lower upper p-value signif
tmg 179.13631 162.48868 195.78394 3.639163e-15 ***
pct_actions -73.14222 -89.12337 -57.16108 1.046939e-08 ***
pvl_actions 62.46782 46.48668 78.44896 1.199526e-07 ***
ppe -125.26721 -144.19952 -106.33490 2.223349e-11 ***
$signif_codes
[1] "Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1"
$effects
── Data Summary ────────────────────────
Values
Name effects
Number of rows 20
Number of columns 4
_______________________
Column type frequency:
numeric 4
________________________
Group variables None
── Variable type: numeric ──────────────────────────────────────────────────────
skim_variable mean sd p0 p25 p50 p75 p100 hist
obj$set_level(95)
res <- obj$predict(X = X_test)
plot(c(y_train, res$preds), type='l',
main="(Probabilistic) Out-of-sample \n Equity Capital Requirement in m€",
xlab="Observation Index",
ylab="Equity Capital Requirement (m€)",
ylim = c(min(c(res$upper, res$lower, y_test, y_train)),
max(c(res$upper, res$lower, y_test, y_train))))
lines(c(y_train, res$upper), col="gray70")
lines(c(y_train, res$lower), col="gray70")
lines(c(y_train, res$preds), col = "red")
lines(c(y_train, y_test), col = "blue", lwd=2)
abline(v = length(y_train), lty=2, col="black", lwd=2)

100*mean((y_test >= as.numeric(res$lower)) * (y_test <= as.numeric(res$upper)))
100
Python version
!pip install skimpy
!pip install ydata-profiling
!pip install nnetsauce
import pandas as pd from skimpy import skim from ydata_profiling import ProfileReport
scr_equity = pd.read_csv("ALIM4D.csv")
scr_equity['SRC_Equity'] = scr_equity['SRC_Equity']/1e6
skim(scr_equity)
ProfileReport(scr_equity)
import nnetsauce as ns
import numpy as np
X, y = scr_equity.drop('SRC_Equity', axis=1), scr_equity['SRC_Equity'].values
from sklearn.utils import all_estimators
from tqdm import tqdm
from sklearn.utils.multiclass import type_of_target
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from time import time
# Get all scikit-learn regressors
estimators = all_estimators(type_filter='regressor')
results_regressors = []
seeds = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
for i, (name, RegressorClass) in tqdm(enumerate(estimators)):
if name in ['MultiOutputRegressor', 'MultiOutputClassifier', 'StackingRegressor', 'StackingClassifier',
'VotingRegressor', 'VotingClassifier', 'TransformedTargetRegressor', 'RegressorChain',
'GradientBoostingRegressor', 'HistGradientBoostingRegressor', 'RandomForestRegressor',
'ExtraTreesRegressor', 'MLPRegressor']:
continue
for seed in seeds:
try:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42+seed*1000)
regr = ns.PredictionInterval(obj=ns.CustomRegressor(RegressorClass()),
method="splitconformal",
level=95,
seed=312)
start = time()
regr.fit(X_train, y_train)
print(f"Elapsed: {time() - start}s")
preds = regr.predict(X_test, return_pi=True)
coverage_rate = np.mean((preds.lower<=y_test)*(preds.upper>=y_test))
rmse = np.sqrt(np.mean((preds-y_test)**2))
results_regressors.append([name, seed, coverage_rate, rmse])
except:
continue
results_df = pd.DataFrame(results_regressors, columns=['Regressor', 'Seed', 'Coverage Rate', 'RMSE']) results_df.sort_values(by='Coverage Rate', ascending=False)
results_df.dropna(inplace=True)
results_df['logRMSE'] = np.log(results_df['RMSE'])
Want to share your content on python-bloggers? click here.