Statistical Hazard Modelling

A very brief introduction

Zak Varty

Why do we care about hazard modelling?

Climate

Heat-waves and cold-snaps,
Drought and floods,
Earthquakes and wildfires.

Finance

Market Crashes,
Portfolio Optimisation,
Insurance and Reinsurance.

Industry

Quality assurance,
Reliability modelling,
Asset protection.

Risk vs Hazard

Hazard \(\approx\) probability: the chance of an event at least as severe as some value happening within in a given space-time window.

\[\Pr(X > x) = 1 - F_X(x).\]

Risk \(\approx\) cost: the potential economic, social and environmental consequences of perilous events that may occur in a specified period of time or space.

\[ \text{VaR}_\alpha (X) = F_X^{-1}(\alpha) \quad or \quad \text{ES}_\alpha(X) = \mathbb{E}[X | X < \text{VaR}_\alpha(X)].\]

Subjectivity in cost function
Convolution of hazard and geographic / demographic information

We can focus on modelling large values

Depending on the peril we are considering, the definition of a “bad” outcome differs:

the largest negative returns in finance,
the largest positive amounts of rain,
the smallest failure stress/time in engineering.

Without loss of generality, we can focus on modelling large positive values, by transforming our data and results as appropriate.

\[g(X_i) \quad \text{e.g.} \quad -X \quad \text{or} \quad X^{-1}.\]

What is wrong with OLS?

An issue with risk / hazard modelling is that we are by definition interested in the rare events, which make up only a very small proportion of our data.

Standard modelling techniques describe the expected outcome.
Each point is weighted equally and typical values are most common and so dominate measures of model fit.

Robust Regression and Quantile Regression

Robust regression:

Models the conditional median \(Y_{0.5} \ |\ X\).
Reduces sensitivity to “outliers”
The opposite of what we want to do!

Generalises to quantile regression:

Model for conditional quantile \(Y_p \ | \ X\)
Okay in sample but sample not always large enough.

Extreme Value Theory

What if we need to do beyond the historical record?

e.g. estimate a 1 in 1000-year flood from 50 years of data.
Extreme Value Theory allows principled extrapolation beyond the range of the observed data.
- Return values and return periods.
Focuses on the most extreme observations.
- bulk vs tail of distribution.

If we care about

\[M_n = \max \{X_1, \ldots, X_n\}\]

How can we model

\[\begin{align*} F_{M_n}(x) &= \Pr(X_1 \leq x,\ \ldots, \ X_n \leq x) \\ &= \Pr(X \leq x) ^n \\ &= F_X(x)^n? \end{align*}\]

Extremal Types Theorem

Analogue of CLT for Sample Maxima. Let’s revisit the CLT:

Suppose \(X_1, X_2, X_3, \ldots,\) is a sequence of i.i.d. random variables with \(\mathbb{E}[X_i] = \mu\) and \(\text{Var}[X_i] = \sigma^2 < \infty\).

As \(n \rightarrow \infty\), the random variables \(\frac{\sqrt{n} (\bar{X}_n - \mu)}{\sigma}\) converge in distribution to a standard normal distribution.

\[ \frac{\sqrt{n} (\bar{X}_n - \mu)}{\sigma} \overset{d}{\longrightarrow} \mathcal{N}(0,1).\]

Rephrasing this as the partial sums rather than partial means:

\[\frac{S_n}{\sigma\sqrt{n}} - \frac{\mu}{\sigma / \sqrt{n}} \overset{d}{\longrightarrow} \mathcal{N}(0,1).\]

Extremal Types Theorem

Analogue of CLT for Sample Maxima. Let’s revisit the CLT:

Under weak conditions on \(F_X\) and where appropriate sequences of constants \(\{a_n\}\) and \(\{b_n\}\) exist:

\[a_n S_n - b_n \overset{d}{\longrightarrow} \mathcal{N}(0,1).\]

Does not depend on \(F_X(x)\), subject to weak conditions (mean exists and finite variance).
Scaling and shifting to avoid degeneracy.
Motivates Gaussian errors as sum of many non-Gaussian errors.
Generalises to non-iid sequences.

Extremal Types Theorem

If suitable sequences of normalising constants exist, then as \(n \rightarrow \infty\):

\[\begin{equation} \label{eqn:lit_extremes_normalising} \Pr\left(\frac{M_n - b_n}{a_n} \leq x \right) \rightarrow G(x), \end{equation}\]

where \(G\) is distribution function of a Fréchet, Gumbel or negative Weibull random variable.

This links to the concept of Maximal Domain of Attraction: if we know \(F_X(x)\) then we can identify \(G(x)\).

But we don’t know \(F\)!

Unified Extremal Types Theorem

These distributional forms are united in a single parameterisation by the Unified Extremal Types Theorem.

The resulting generalised extreme value (GEV) family of distribution functions has the form

\[\begin{equation} \label{eqn:lit_extremes_GEV} G(x) = \exp\left\{ -\left[ 1 + \xi \frac{x - \mu}{\sigma} \right]_{+}^{-1/\xi}\right\}, \end{equation}\]

where \(x_+ = \max(x,0)\), \(\sigma \in \mathbb{R}^+\) and \(\mu , \xi \in \mathbb{R}\). The parameters \(\mu, \sigma\) and \(\xi\) have respective interpretations as location, scale and shape parameters.

\(\xi > 0\) correspond to a Fr'echet distribution and a heavy upper tail.
\(\xi = 0\) the GEV is equivalent to a Gumbel distribution and has an exponential upper tail.
\(\xi < 0\) correspond to a negative Weibull distribution, which is light tailed and has a finite upper end point.

GEV in Action

CLT and UETT are asymptotic results, we use them as approximations for finite \(n\).
Split the data into \(m\) blocks of length \(k\) and model the \(M_k\).
How to pick the block size? Trade-off between bias and variance.
Annual blocks often uses as a pragmatic choice to handle seasonal trends.

Peaks Over Threshold Modelling

Alternative definition of “extreme”: any \(X_i > u\).
Generalised Pareto limit distribution for \(X_i - u \ | \ X_i > u\).
In applications need to pick threshold \(u\).
To undo conditioning, we also model \(\lambda_u = \Pr(X_i > u)\).

Adding in Non-stationarity

IID assumption is overly restrictive
Asymptotic theory holds for stationary sequences, long as dependence decays “fast enough”.
We can consider GLM style modelling using covariates.

\[X_i - u | X_i > u, z_i \sim \text{GPD}(\sigma(z_i),\ \xi(z_i)).\]

\[ \lambda(z_i) = \Pr(X_i > u | z_i) = \frac{\exp\{\beta_0 + \beta_1 z_i\}}{1 + \exp\{\beta_0 + \beta_1 z_i\}}.\]

Point Processes

EVT gives us a model for the size of rare events, to get risk maps or forecasts we also care about how they occur over time/space.

Point processes are a stochastic process \(\{X_1, \ldots ,X_N\}\) where RVs represent locations of event in time or space and the number of these is random.

Poisson process is simplest version, Poisson(\(\lambda\)) number events located independently and uniformly at random.

Homogeneous Poisson Processes

Not a useful model, too simple to be realistic.
Central case from which to define clustered or regular occurrences.

Figure 1: Locations of 42 cell centres in a unit square .

Figure 2: Locations of 65 Japanese black pine saplings in a square of side 5.7 m .

Figure 3: Locations of 62 redwood seedlings in a square of side 23 m.

Testing for CSR

Some simulated point patterns - which is random, clustered or repulsive?

Humans are rubbish at this. How can we formally test instead?

Relaxing the assumptions

Inhomogeneous Poisson Process: event count still Poisson, events locations still independent but rate of events allowed to vary over time/space:

\[ \lambda(t) = \lim_{\delta \rightarrow 0 } \frac{N(t, t + \delta)}{\delta}.\]

The expected number of events in a region \(A\) is given by the integral of this intensity function.

\[ \Lambda(A) = \int_A \lambda(t) \mathrm{d}t \quad \text{e.g.} \quad \int_{t_{0}}^{t_{1}} \exp(a + bt) \mathrm{d}t.\]

Interested in describing the number, location and any additional information about events – potentially using covariates to do so.

Further relaxations: renewal processes, Poisson cluster processes, self-exciting processes.

All together now

Peaks Over Threshold

conditional model for the size perilous events,
makes efficient use of the available data.

Point Processes

joint description of how many perils and where / when we expect to see them.

Combining these we can undo the conditioning to assess hazard and risk.

hazard: number, location and magnitude of peril. risk: convolution of hazard with context and opinion.

Challenge: What if we do not have a good guess at GLM form? Mix with flexible regression techniques.

Starting Points

Extreme Value Modelling

Course Notes
Coles (2001) Introduction to statistical modelling of extreme values.
{ismev} R package.

Point Process Modelling

Course Notes
Diggle (2014) Statistical analysis of spatial and spatio-temporal point patterns.
{sp} and {spatstat} R packages.

Build Information

R version 4.3.3 (2024-02-29)

Platform: x86_64-apple-darwin20 (64-bit)

locale: en_US.UTF-8||en_US.UTF-8||en_US.UTF-8||C||en_US.UTF-8||en_US.UTF-8

attached base packages: stats, graphics, grDevices, utils, datasets, methods and base

loaded via a namespace (and not attached): Matrix(v.1.6-1.1), bit(v.4.0.5), jsonlite(v.1.8.8), crayon(v.1.5.2), compiler(v.4.3.3), Rcpp(v.1.0.12), tidyselect(v.1.2.1), MatrixModels(v.0.5-3), parallel(v.4.3.3), showtext(v.0.9-7), zvplot(v.0.0.0.9000), splines(v.4.3.3), png(v.0.1-8), yaml(v.2.3.8), fastmap(v.1.1.1), lattice(v.0.22-5), readr(v.2.1.5), R6(v.2.5.1), showtextdb(v.3.0), knitr(v.1.45), MASS(v.7.3-60.0.1), tibble(v.3.2.1), pander(v.0.6.5), pillar(v.1.9.0), tzdb(v.0.4.0), rlang(v.1.1.3), utf8(v.1.2.4), xfun(v.0.43), bit64(v.4.0.5), cli(v.3.6.2), magrittr(v.2.0.3), digest(v.0.6.35), grid(v.4.3.3), vroom(v.1.6.5), rstudioapi(v.0.16.0), quantreg(v.5.97), hms(v.1.1.3), lifecycle(v.1.0.4), sysfonts(v.0.8.9), vctrs(v.0.6.5), SparseM(v.1.81), evaluate(v.0.23), glue(v.1.7.0), survival(v.3.5-8), fansi(v.1.0.6), rmarkdown(v.2.26), tools(v.4.3.3), pkgconfig(v.2.0.3) and htmltools(v.0.5.8.1)