Spoilt for Choice: Dealing with Categorical Features in Actuarial Machine Learning

126 views
0 comments
0 likes
1 favorites

actuview
1366 media
uploaded July 18, 2023

High-cardinality categorical features (i.e. categorical features with many levels) are pervasive in actuarial problems, e.g. occupation in commercial property insurance. Standard processing methods like one-hot encoding become inadequate when cardinality grows.

In this work, we present a novel Generalised Linear Mixed Model Neural Network (“GLMMNet”) approach to the modelling of high-cardinality categorical features. The GLMMNet integrates a generalised linear mixed model in a deep learning framework, offering both the predictive power of neural networks and the transparency of random effects estimates. Further, its flexibility to deal with any distribution in the exponential dispersion (ED) family makes it widely applicable to many actuarial contexts and beyond.

We illustrate and compare the GLMMNet against existing approaches in a range of experiments, including a real-life insurance case study. Notably, we find that the GLMMNet often outperforms or at least performs comparably with an entity embedded neural network, while providing the additional benefit of transparency.

Importantly, while our model was motivated by actuarial applications, it can have wider applicability. The GLMMNet would suit any applications that involve high-cardinality categorical variables and where the response cannot be sufficiently modelled by a Gaussian distribution.

Find the Q&A here: Q&A on 'Insurance Pricing and Maching Learning'