A lot of computations needed to support the life insurance industry rely on mortality tables. These include computations embedded in accounting rules and various statutes regulating life insurance such as non-forfeiture laws. These mortality tables predict for various groups of people (such as female smokers who have obtained insurance) certain probabilities relating to death. These probabilities can be measured in various ways such as “survival”, the probability that someone will still be alive at time a specified age, ” or the “force of mortality,” the probability that someone alive at a specified age will die the following year. Those wanting to explore these tables can look here.

But the tables are discrete-valued. They usually give answers only for integer values of age. That creates three issues. First, it does not tell you what the probability that someone alive at 50.5 will be alive at age 57.3. This isn’t too bad of a problem since one can interpolate between function values and make a pretty useful guess. Second, however, discrete-valued functions are often harder to work with mathematically than their continuous counterparts. Regular Issac Newton/Leibniz style calculus is actually often easier than the discrete calculus. Various computations that are useful in life insurance thus become difficult when all one has is a table. And third, the tables don’t tell us much about the physical process of aging and why people die at various rates.

And thus the quest for continuous mortality functions. These are functions like B c^x that predict the force of mortality for any positive value of x (age). There are a variety of these functions that have been developed over a few centuries of actuarial science, including the Gompertz model, the Gompertz-Makeham model, the Penna model, and the Weibull model. Generally, these functions have been derived from a feedback loop in which various theories about how aging might occur are guided by experience from mortality tables. And, usually, they are the solution to a differential equation based on a biological or physical model of aging. All well and good, and quite successful.

Until recently, however, this might have been the only methodology with which to derive useful mortality functions. It occurred to me recently, however, that this might not be the only way, or even the best way to proceed anymore. The key idea is to use “Genetic Programming,” a method developed about 1992 by John Koza in his book Genetic Programming: On the Programming of Computers by Means of Natural Selection

What’s genetic programming? It’s brute force guided in the same way that nature guides evolution. The idea is to represent mathematical formulas as mathematical objects called “trees.” Here’s an example of a formula you might recall represented as a tree.

Formulas that are “successful” in some way — such as those that are short and/or correctly predict mortality levels — get to mutate and mate (what fun!) with other successful formulas. The picture below attempts to depict the mating ritual of the formulae.

The process of natural selection continues until one has a family of formulas that actually do a good job. The process does exercise the CPU of your computer, but it also works remarkably well on a diverse range of problems. My idea is that if one found good formulas that had heretofore been missed, one might then be able to “reverse engineer” information about the process of aging or at least perform more accurate and effective actuarial computations. And, if the best formulas turned out to be the existing ones, that would provide some confirmation that the physical and biological processes that inform them indeed have some validity.

Until very recently, however, doing genetic programming correctly and effectively was extraordinarily difficult. Turns out that while the idea is extraordinarily powerful, the details of implementation are often considerable. We are talking pages and pages of complex code that then has to be integrated with the rest of one’s computing environment. Not a job for a law professor, even a geeky one. In my view, that obstacle is now overcome. There is a product soon to be on the market that I have had the privilege of beta testing called DataModeler that does an unbelievable job of genetic programming and one of its subcategories used in statistical analysis: “Symbolic Regression.” The product is an add-on to Mathematica, my favorite programming language. Mathematica is a natural environment for genetic programming because it already represents mathematical expressions (and everything else) as just the sort of trees demanded by genetic programming. DataModeler is designed right. Its architecture is stunningly clean, leading to just the sort of flexibility and adaptability users will demand; it is beautifully integrated with the rest of Mathematica; it is well documented; and, OMG, it actually works! You don’t need to know a heck of a lot about the details of genetic programming or even Mathematica to get the package to produce remarkable results, often in just a few minutes. I’ve used it on several earlier projects (here and here) and I am once finding the ever-growing versatility of DataModeler as it heads for release to be nothing short of astonishing.

In a future post, I’ll discuss some results of this project, but I think I can disclose that I’ve made some interesting discoveries.