Tutorial - Step 4

Database Familiarization Part 2

[Previous]  [Next]

Now that we have summary statistics about the database, it would be useful to know what proportion of the Responses were YES and what proportion were NO.

Using the distribution node

You can read the following information from this distribution graph:

We are particularly interested in the factors that influenced the 120 customers who responded to the newsletter. We could use this information to increase the response rate for future events.

Close the dialog when you have examined it. Closing the dialog returns you to the main workspace.

 

Data Cleansing

It is important that the data we have collected is clean and does not contain spurious entries. We cannot use the information presented in the statistics node to identify possible outliers, such as:

Another part of the cleansing stage is to split the database into training and testing sets. For the purpose of this tutorial we will use the database as a whole. In normal projects, you would use the sample records node to split the database.

[Previous]   [Next]