Wednesday, January 27, 2010

Components of Predictive Analytics

Data mining can be defined as an analytical tool set that searches for data patterns automatically and identifies specific patterns within large datasets across disparate organizational systems. Data mining, text mining, and Web mining are types of pattern identification. Organizations can use these forms of pattern recognition to identify customers' buying patterns or the relationship between a person's financial records and their credit risk. Predictive analytics moves one step further and applies these patterns to make forward-looking predictions. Instead of just identifying a potential credit risk, an organization can identify the lifetime value of a customer by developing predictive decision models and applying these models to the identified patterns. These types of pattern identification and forward-looking model structures can equally be applied to BI and performance management solutions within an organization.

Predictive analytics is used to determine the probable future outcome of an event, or the likelihood of a situation occurring. It is the branch of data mining concerned with the prediction of future probabilities and trends. Predictive analytics is used to analyze automatically large amounts of data with different variables, including clustering, decision trees, market basket analysis, regression modeling, neural nets, genetic algorithms, text mining, hypothesis testing, decision analytics, and so on.

The core element of predictive analytics is the predictor, a variable that can be measured for an individual or entity to predict future behavior. These predictors are based on models that are created to use the analytical capabilities within the generated predictive models. Descriptive models classify relationships by identifying customers or prospective customers, and placing them in groups based on identified criteria. Decision models consider business and economic drivers and constraints that surpass the general functionality of a predictive model. In a sense, statistical analysis helps to drive this process as well. The predictors are the factors that help identify the outcomes of the actual model. For example, a financial institution may want to identify the factors that make a valuable lifetime customer.

Multiple predictors can be combined into a predictive model, which, when subjected to analysis, can be used to forecast future probabilities with an acceptable level of reliability. In predictive modeling, data is collected, a statistical model is formulated, predictions are made, and the model is validated (or revised) as additional data becomes available. One of the main differences between data mining and predictive analytics is that data mining can be a fully automated process, whereas predictive analytics requires an analyst to identify the predictors and apply them to the defined models.

A decision tree is a variable within predictive analytics that allows the user to visualize the mapping of observations about an item and compare it to conclusions about the item's target value. Basically, decision trees are built by creating a hierarchy of predictor attributes. The highest level represents the outcome, and each sub-level identifies another factor in that conclusion. This can be compared to if-else statements, which identify a result based on whether certain factors meet specified criteria. For example, in order to assess potential bad debt based on credit history, salary, demographics, and so on, a financial institution may wish to identify multiple scenarios, each of which is likely to meet bad debt customer criteria, and use combinations of those scenarios to identify which customers are most likely to become bad debt accounts.

Regression analysis is another component of predictive analytics that allows users to model relationships between three or more variables in order to predict the value of one variable in comparison to the values of the others. It can be used to identify buying patterns based on multiple demographic qualifiers such as age and gender which can be beneficial to identify where to sell specific products. Within BI, this is beneficial when used with scorecards that focus on geography and sales.

No comments:

Post a Comment