French Motor Third-Part Liability datasets
freMTPL.RdIn the two datasets freMTPLfreq, freMTPLsev, risk features are collected
for 413,169 motor third-part liability policies (observed mostly on one year).
In addition, we have claim numbers by policy as well as the corresponding
claim amounts.
freMTPLfreq contains the risk features and the claim number
while freMTPLsev contains the claim amount and the
corresponding policy ID.
Some claim amounts of freMTPLsev are fixed claim amounts
based on the French IRSA-IDA claim convention, see e.g.~https://www.index-assurance.fr/pratique/sinistre/convention-irsa.
In the two datasets freMTPL2freq, freMTPL2sev, risk features are collected
for 677,991 motor third-part liability policies (observed mostly on one year).
In addition, we have claim numbers by policy as well as the corresponding
claim amounts.
freMTPL2freq contains the risk features and the claim number
while freMTPL2sev contains the claim amount and the
corresponding policy ID.
Some claim amounts of freMTPL2sev are fixed claim amounts
based on the French IRSA-IDA claim convention, see e.g.~https://www.index-assurance.fr/pratique/sinistre/convention-irsa.
Format
freMTPLfreq contains 10 columns:
PolicyIDThe policy ID (used to link with the claims dataset).
ClaimNbNumber of claims during the exposure period.
ExposureThe period of exposure for a policy, in years.
PowerThe power of the car (ordered categorical).
CarAgeThe vehicle age, in years.
DriverAgeThe driver age, in years (in France, people can drive a car at 18).
BrandThe car brand divided in the following groups:
A- Renaut Nissan and Citroen,B- Volkswagen, Audi, Skoda and Seat,C- Opel, General Motors and Ford,D- Fiat,E- Mercedes Chrysler and BMW,F- Japanese (except Nissan) and Korean,G- other.GasThe car gas, Diesel or regular.
RegionThe policy region in France (based on the 1970-2015 classification).
DensityThe density of inhabitants (number of inhabitants per km2) in the city the driver of the car lives in.
freMTPLsev contains 2 columns:
PolicyIDThe occurence date (used to link with the contract dataset).
ClaimAmountThe cost of the claim, seen as at a recent date.
freMTPL2freq contains 11 columns:
IDpolThe policy ID (used to link with the claims dataset).
ClaimNbNumber of claims during the exposure period.
ExposureThe period of exposure for a policy, in years.
VehPowerThe power of the car (ordered values).
VehAgeThe vehicle age, in years.
DrivAgeThe driver age, in years (in France, people can drive a car at 18).
BonusMalusBonus/malus, between 50 and 350: <100 means bonus, >100 means malus in France.
VehBrandThe car brand (unknown categories).
VehGasThe car gas, Diesel or regular.
AreaThe density value of the city community where the car driver lives in: from
"A"for rural area to"F"for urban centre.DensityThe density of inhabitants (number of inhabitants per square-kilometer) of the city where the car driver lives in.
RegionThe policy region in France (based on the 1970-2015 classification).
freMTPL2sev contains 2 columns:
IDpolThe occurence date (used to link with the contract dataset).
ClaimAmountThe cost of the claim, seen as at a recent date.
References
M. Denuit, A. Charpentier, and J. Trufin (2021), Autocalibration and tweedie-dominance for insurance pricing with machine learning, Insurance: Mathematics and Economics, 101, 485–497, doi:10.1016/j.insmatheco.2021.09.001 .
T. Miljkovic and D. Fernández (2018). On Two Mixture-Based Clustering Approaches Used in Modeling an Insurance Portfolio, Risks, 6(2), 1–18, doi:10.3390/risks6020057 .
A. Noll, R. Salzmann, and M. V. Wuthrich (2020), Case study: French motor third-party liability claims, Innovation Practice eJournal, doi:10.2139/ssrn.3164764 .
N. Pocuca, P. Jevtic, P. D. McNicholas, and T. Miljkovic (2020), Modeling frequency and severity of claims with the zero-inflated generalized cluster-weighted models, Insurance: Mathematics and Economics, 94, 79–93, doi:10.1016/j.insmatheco.2020.06.004 .
Examples
# (1) load of data
#
data(freMTPLfreq)
dim(freMTPLfreq)
#> [1] 413169 10
data(freMTPLsev)
dim(freMTPLsev)
#> [1] 16181 2
# (2) check
#should be equal
sum(freMTPLsev$PolicyID %in% freMTPLfreq$PolicyID)
#> [1] 16181
sum(freMTPLfreq$ClaimNb)
#> [1] 16181
# (1) load of data
#
data(freMTPL2freq)
dim(freMTPL2freq)
#> [1] 677991 12
data(freMTPL2sev)
dim(freMTPL2sev)
#> [1] 26444 2