« IBM, health group sign deal to mine patient data to improve care | Main | The Web is the Most Common Way to Sell »

A Closer Look at Predictive Analytics

major goal of predictive analytics is to move away from seat-of-the-pants decision-making. But that can run afoul of organizational politics if, say, the brand manager doesn't agree with the conclusions. Models can be revised as needed, but not just because the results don't support the brand manager's theory.

Predictive analytics is a set of mathematical techniques applied to a data set for determining the probability that some scenario is likely to happen or be true. These techniques are applied to many research areas, including meteorology, genetics and marketing -- areas in which there's an abundance of data and a need to forecast the future.
In business, predictive analytics are often used to answer questions about customer behavior. For example, companies often want to know whether or not a particular customer is likely to be interested in a direct-mail offer.

Or a business might want to know whether, given a certain set of premiums and benefits, a new customer will become a long-term customer. Ultimately, businesses want predictive analytics to suggest how to best target resources for maximum return.

Cross-selling, upselling, determining customer profitability and promoting customer loyalty are the best-known uses of this technology, according to a report by Forrester Research Inc. analyst Lou Agosta. But there are many other applications, he notes, including credit scoring, predicting machine failures and making the supply chain more efficient.

Plenty of high-level mathematics are involved, but stated simply, predictive analytics is used to ask which characteristics, called predictors, in a data set are clustered together. The technique is also used to determine whether, given a set of predictors, the value for some other characteristic is likely to fall within a desired range.

Though these two questions sound very similar, in practice, they're quite different.

The first one, the search for clustered characteristics, is like saying, "Look through my database of information and find something about my business that I overlooked or might not already know." You might look through the history of people who have declared bankruptcy to find which characteristics are most tightly linked together: late payments, number of addresses within the past two years, recent divorce or health problems, for example.

The second question, determining whether a particular characteristic falls within a desired range, is like saying, "Given what I know about a customer, find out how likely it is that something else is true."

For example, you might want to analyze the characteristics of a person filing an insurance claim to determine the likelihood that the claim is false. The predictors could be how recently he filed his last claim, the dollar amount of that claim or how long the customer has had the policy.

The two approaches work together. Once linked characteristics have been identified, then the second question can be asked. After an insurance company has found which characteristics are most tightly linked to fraud, for example, it can create an equation that produces a number indicating how likely it is that a particular claim is fraudulent
Suppose an automobile insurance company, AA Acme Insurance, already knows that multiple claims within a three-year period are closely associated with fraud. It could use predictive analytics to quantify the linkage. The result might be the equation:

Possibility of Fraud = 1/Square of (Time Since Last Claim)

In other words, the closer together the claims come, the more likely fraud is occurring. The equation is called a predictive model.

Then suppose that AA Acme fed all the data it collected on its customers into a predictive analytics system that found tightly linked characteristics and learned that there was another predictor of fraud that it had never imagined. Perhaps the recent purchase of a new home was also linked to automobile insurance fraud. Using this result, it might come up with a new equation:

Possibility of Fraud = 1/Square of (Time Since Last Claim) + 1/(Time Since Last Claim) + 3/(Time Since Last Home Purchase)

This model might catch fraud that would have gone undetected previously.

One key to making models like this work is having plenty of clean data to work with. "Without examples of fraud, the neural network cannot be trained on what to look for," says Agosta. And without historical examples of sales of a new product, a model can't predict a new market opportunity, he says. This is a case where more data is better, to smooth out idiosyncrasies and reveal information that might otherwise be lost in the noise.

Agosta cites the story of a model that showed senior citizens were buying rap music -- not for themselves, but for their grandchildren. If the modeler hadn't "known that they had grandchildren, then the predictive inference would have been impossible or inaccurate," he says.

Predictive analytics is full of other challenges, too -- from organizational politics to model design, validation and data preparation. "Data preparation can be up to 80% of the effort of predictive analytics, and many firms are not getting beyond it," Agosta says.

A major goal of predictive analytics is to move away from seat-of-the-pants decision-making. But that can run afoul of organizational politics if, say, the brand manager doesn't agree with the conclusions. Models should be revised as needed, but not just because the results don't support the brand manager's theory.

"Do not second-guess validated test results without cause and consideration," Agosta cautions. "Be true to your predictive model."

Predicitve Analytics article


TrackBack

TrackBack URL for this entry:
http://www.techslog.com/movabletype/mt-tb.cgi/158

Comments

You make some interesting points. I would add two more.

First, so-called "sanity checking" by supposed business "experts" turns up false notions as often as true ones. In my work, about half the time my analysis has been challenged, further investigation shows that popular ideas about our business (banking) are flat-out wrong. My favorite response to self-assured managers is, "I have 78,000 customers on my side. Why don't you tell them they should behave differently?"

Second, managers can be difficult, but auditors and regulators are even worse in the sense that they carry a heavy veto power.

-Will Dwinnell, Data Miner
http://will.dwinnell.com
Data Mining and Predictive Analytics
Data Mining in MATLAB

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Subscribe To Email

Subscribe via email
Subscribe to techslog via email