Reimagining Equity Solvency Capital Requirement Approximation (one of my Master’s Thesis subjects): From Bilinear Interpolation to Probabilistic Machine Learning

This article was first published on T. Moudiki's Webpage - Python , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Reimagining Equity Solvency Capital Requirement Approximation (one of my Master’s Thesis subjects): From Bilinear Interpolation to Probabilistic Machine Learning

In the world of insurance and financial risk management, calculating the Solvency Capital Requirement (SCR) for equity risk could be a computationally intensive task that can make or break real-time decision making. Traditional approaches rely on expensive Monte Carlo simulations that can take hours to complete, forcing practitioners to develop approximation schemes. Developing an approximation scheme was a project I tackled back in 2007-2009 for my Master’s Thesis in Actuarial Science (see references below).

What I did back then

  • 96 expensive ALIM simulations were run across four key variables:
    • Minimum guaranteed rate (tmg): 1.75% to 6%
    • Percentage of investments in stocks: 2% to 6.25%
    • Latent capital gains on equities: 2% to 6.25%
    • Profit sharing provisions (ppe): 3.5 to 10
  • Multi-stage interpolation strategy: I decomposed the problem into multiple 2D approximation grids, then combined cross-sections to reconstruct the full 4D surface.

  • Validation through error analysis: Rigorous comparison between simulation results and approximations to ensure the method’s reliability.

A Modern Probabilistic Approach

Today, I revisit this same challenge through the lens of probabilistic machine learning, and obtain functional expressions/approximations in R and Python. Fascinating how easy it may look now!

This probabilistic approach offers several advantages:

  • Built-in uncertainty quantification: Know not just the prediction, but how confident we should be
  • Automatic feature learning: Let the model discover optimal representations
  • Fast

Of course, having a functional probabilistic machine learning model, we can think of many ways to stress test (i.e obtain what-if analyses) these results, based on changes in one (or more) of the explanatory variables


References:

R version

(scr_equity <- read.csv("ALIM4D.txt"))
A data.frame: 96 × 5
tmgpct_actionspvl_actionsppeSRC_Equity
<dbl><dbl><dbl><dbl><dbl>
1.752.002.003.5056471378
1.752.002.009.0048531931
1.752.002.009.5048558178
1.752.002.009.7548570523
5.002.002.003.5065111083
5.002.002.009.0054433115
5.002.002.009.5054436348
5.002.002.009.7554526734
5.252.002.003.5065244870
5.252.002.009.0054325632
5.252.002.009.5054387565
5.252.002.009.7554418533
5.502.002.003.5065396012
5.502.002.009.0054239282
5.502.002.009.5054302132
5.502.002.009.7554333018
5.752.002.003.5065581289
5.752.002.009.0054168174
5.752.002.009.5054209587
5.752.002.009.7554210481
6.002.002.003.5065785420
6.002.002.009.0054042241
6.002.002.009.5054103639
6.002.002.009.7554134241
1.752.752.759.0048435808
1.752.752.759.2548446558
1.752.752.759.5048459074
1.752.752.759.7548473874
5.002.752.759.0054501129
5.002.752.759.2554531852
5.756.006.009.5053901737
5.756.006.009.7553968463
6.006.006.003.5062378886
6.006.006.009.2553780562
6.006.006.009.5053730182
6.006.006.009.7553950814
1.756.256.253.5051709654
1.756.256.259.2547537722
1.756.256.259.5047543381
1.756.256.259.7547555017
5.006.256.253.5061268505
5.006.256.259.2554207189
5.006.256.259.5054234608
5.006.256.259.7554268573
5.256.256.253.5061467297
5.256.256.259.2554070788
5.256.256.259.5054096598
5.256.256.259.7554154003
5.506.256.253.5061671008
5.506.256.259.2553964700
5.506.256.259.5053994107
5.506.256.259.7554052459
5.756.256.253.5061868864
5.756.256.259.2553862132
5.756.256.259.5053881640
5.756.256.259.7553941964
6.006.256.253.5062112237
6.006.256.259.2553734035
6.006.256.259.5053764618
6.006.256.259.7553825660
scr_equity$SRC_Equity <- scr_equity$SRC_Equity/1e6
options(repos = c(techtonique = "https://r-packages.techtonique.net",
                    CRAN = "https://cloud.r-project.org"))

install.packages(c("rvfl", "learningmachine"))
set.seed(13)
train_idx <- sample(nrow(scr_equity), 0.8 * nrow(scr_equity))
X_train <- as.matrix(scr_equity[train_idx, -ncol(scr_equity)])
X_test <- as.matrix(scr_equity[-train_idx, -ncol(scr_equity)])
y_train <- scr_equity$SRC_Equity[train_idx]
y_test <- scr_equity$SRC_Equity[-train_idx]
obj <- learningmachine::Regressor$new(method = "krr", pi_method = "none")
obj$get_type()
t0 <- proc.time()[3]
obj$fit(X_train, y_train, reg_lambda = 0.1)
cat("Elapsed: ", proc.time()[3] - t0, "s \n")

‘regression’

Elapsed:  0.005 s 
print(sqrt(mean((obj$predict(X_test) - y_test)^2)))
[1] 0.7250047
obj$summary(X_test, y=y_test, show_progress=TRUE)
  |======================================================================| 100%



$R_squared
[1] 0.9306298

$R_squared_adj
[1] 0.9121311

$Residuals
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-1.097222 -0.590318 -0.051308 -0.006375  0.447859  1.660139 

$citests
              estimate      lower       upper      p-value signif
tmg          0.8311760 -0.8484270  2.51077903 3.133161e-01       
pct_actions -0.4845265 -0.9327082 -0.03634475 3.555821e-02      *
pvl_actions -0.4845265 -0.9327082 -0.03634475 3.555821e-02      *
ppe         -2.2492137 -2.4397536 -2.05867385 6.622214e-16    ***

$signif_codes
[1] "Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1"

$effects
── Data Summary ────────────────────────
                           Values 
Name                       effects
Number of rows             20     
Number of columns          4      
_______________________           
Column type frequency:            
  numeric                  4      
________________________          
Group variables            None   

── Variable type: numeric ──────────────────────────────────────────────────────
  skim_variable   mean    sd    p0    p25    p50    p75  p100 hist 
obj <- learningmachine::Regressor$new(method = "rvfl",
                                      nb_hidden = 3L,
                                      pi_method = "kdesplitconformal")
t0 <- proc.time()[3]
obj$fit(X_train, y_train, reg_lambda = 0.01)
cat("Elapsed: ", proc.time()[3] - t0, "s \n")
Elapsed:  0.006 s 
obj$summary(X_test, y=y_test, show_progress=FALSE)
$R_squared
[1] 0.8556358

$R_squared_adj
[1] 0.8171387

$Residuals
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-2.1720 -1.2977 -0.8132 -0.8003 -0.3254  0.5877 

$Coverage_rate
[1] 100

$citests
              estimate      lower      upper      p-value signif
tmg          179.13631  162.48868  195.78394 3.639163e-15    ***
pct_actions  -73.14222  -89.12337  -57.16108 1.046939e-08    ***
pvl_actions   62.46782   46.48668   78.44896 1.199526e-07    ***
ppe         -125.26721 -144.19952 -106.33490 2.223349e-11    ***

$signif_codes
[1] "Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1"

$effects
── Data Summary ────────────────────────
                           Values 
Name                       effects
Number of rows             20     
Number of columns          4      
_______________________           
Column type frequency:            
  numeric                  4      
________________________          
Group variables            None   

── Variable type: numeric ──────────────────────────────────────────────────────
  skim_variable   mean   sd     p0    p25    p50   p75  p100 hist     
obj$set_level(95)
res <- obj$predict(X = X_test)

plot(c(y_train, res$preds), type='l',
     main="(Probabilistic) Out-of-sample \n Equity Capital Requirement in m€",
     xlab="Observation Index",
     ylab="Equity Capital Requirement (m€)",
     ylim = c(min(c(res$upper, res$lower, y_test, y_train)),
              max(c(res$upper, res$lower, y_test, y_train))))
lines(c(y_train, res$upper), col="gray70")
lines(c(y_train, res$lower), col="gray70")
lines(c(y_train, res$preds), col = "red")
lines(c(y_train, y_test), col = "blue", lwd=2)
abline(v = length(y_train), lty=2, col="black", lwd=2)

image-title-here

100*mean((y_test >= as.numeric(res$lower)) * (y_test <= as.numeric(res$upper)))

100

Python version

!pip install skimpy
!pip install ydata-profiling
!pip install nnetsauce
import pandas as pd
from skimpy import skim
from ydata_profiling import ProfileReport
scr_equity = pd.read_csv("ALIM4D.csv")
scr_equity['SRC_Equity'] = scr_equity['SRC_Equity']/1e6
skim(scr_equity)
ProfileReport(scr_equity)
import nnetsauce as ns
import numpy as np

X, y = scr_equity.drop('SRC_Equity', axis=1), scr_equity['SRC_Equity'].values

from sklearn.utils import all_estimators
from tqdm import tqdm
from sklearn.utils.multiclass import type_of_target
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from time import time

# Get all scikit-learn regressors
estimators = all_estimators(type_filter='regressor')

results_regressors = []

seeds = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

for i, (name, RegressorClass) in tqdm(enumerate(estimators)):

    if name in ['MultiOutputRegressor', 'MultiOutputClassifier', 'StackingRegressor', 'StackingClassifier',
                'VotingRegressor', 'VotingClassifier', 'TransformedTargetRegressor', 'RegressorChain',
                'GradientBoostingRegressor', 'HistGradientBoostingRegressor', 'RandomForestRegressor',
                'ExtraTreesRegressor', 'MLPRegressor']:

        continue

    for seed in seeds:

        try:

          X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
                                                              random_state=42+seed*1000)
          regr = ns.PredictionInterval(obj=ns.CustomRegressor(RegressorClass()),
                                       method="splitconformal",
                                       level=95,
                                       seed=312)
          start = time()
          regr.fit(X_train, y_train)
          print(f"Elapsed: {time() - start}s")
          preds = regr.predict(X_test, return_pi=True)
          coverage_rate = np.mean((preds.lower<=y_test)*(preds.upper>=y_test))
          rmse = np.sqrt(np.mean((preds-y_test)**2))
          results_regressors.append([name, seed, coverage_rate, rmse])

        except:

          continue
results_df = pd.DataFrame(results_regressors, columns=['Regressor', 'Seed', 'Coverage Rate', 'RMSE'])
results_df.sort_values(by='Coverage Rate', ascending=False)
results_df.dropna(inplace=True)
results_df['logRMSE'] = np.log(results_df['RMSE'])
To leave a comment for the author, please follow the link and comment on their blog: T. Moudiki's Webpage - Python .

Want to share your content on python-bloggers? click here.