CRISP-DM: process for data mining

on August 8, 2016 Data crunching, Uncategorized with 0 comments

To Decisive Facts, data mining is not a goal in itself. It is a way to transform mountains of data into valuable insights and help you as our client to make fact based decisions for your organization. We use the CRISP-DM, which stands for Cross Platform Industry Standard for Data Mining, as our standard methodology for data mining projects.

What does CRISP-DM mean?

Decisive Facts uses a standard process to evaluate data, based on the CRISP-DM process. The process of data mining is a cycle. This cycle consists of six phases, namely:
CRISP-DM

  1. Business understanding
  2. Data understanding
  3. Data preparation
  4. Modeling
  5. Evaluation
  6. Deployment

The sequence of these phases is not rigid. There is always interaction between the phases. Figure 1 indicates the most important dependencies between phases. We can execute the complete process for you. Decisive Facts can also step-in in any phase of this process to assist or consult you. Examples thereof are: help on data cleansing (strategies), a second opinion on a model that you already have in operation, and improve the efficiency or the process quality.

To get a better understanding how we operate, we hereby describe the subsequent phases for you:

1. Business Understanding

In the first phase we focus on understanding your specific case from a business perspective. We have experience in investment and private banking, transportation, real estate and health care. On a functional level, we have experience in costing models, cost and revenue recognition, benchmarking, questionnaires and client data analysis.

In the business understanding phase we start with a project plan. We explore the motivation(s) behind the project (why). In constant dialogue with you, we propose multiple approaches to tackle the issue at hand. Based on constraints, resources and cultural fit, you decide which approach is the best fit (how).

2. Data Understanding

When we acquire the data from you, we start our process of data understanding. We assess the collected data based on quality and quantity of the data. We track omissions in the datasets, to assure focus on the complete picture. Afterwards we do a basic analysis to confirm congruence between the provided data and your objective. We might get a better insight and reveal other objectives for you. While doing this, we verify the quality of the data, for example checking whether the data are correct and complete. We are also very keen on consistency and missing values. Our data cleaning is a structured process. All inconsistencies are reported to you in detail.

3. Data Preparation

This phase covers all the operations necessary to construct the final dataset, the data that will be used for modeling. First we need to determine which data we will be used. Criteria include relevance to your goals, data quality and technical constraints. After we have decided on the optimal dataset to use, we proceed by improving / enriching the quality of the data. This is done by extensive data cleaning. This can be done by selecting clean subsets, or even by estimating the missing data by modeling.

4. Modeling

Once we have the final dataset, we can start the modeling process. We use several modeling types for different goals. The first step is selecting an appropriate technique, e.g. time-series analysis or cross-section analysis. Before we start with our actual model, we construct a mechanism to test the quality and validity of our model. Clients can choose for a one off model at a specific moment, or for an option to generate a model periodically. Now it is time to run the actual model on the final dataset and interpret the results. We construct a business analysis and link the results to the obtained objectives in a report.

5. Evaluation

We have built some models of high quality from a data analysis perspective. In this step we assess the extent to which the model meets the business objectives and we check whether this model is effective and efficient. Depending on the results, we decide whether we have covered the modelling stages in-depth and are ready to move to deployment. Or whether there are considerations which require further attention.

6. Deployment

Decisive Facts’ aim is to support decision making; so data solutions should be presented in a way which is useful and understandable. Often pilots are needed to find the best way of integrating the output of the analysis in your daily work. Depending on the needs of you as a client, this phase can vary from generating a single report to management to a periodic analysis distributed throughout the whole organization.
 

Add comment

CAPTCHA * Time limit is exhausted. Please reload CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Decisive Facts