The pre-processing stage modifies the database by performing discretisation on the fields, or by selecting a subset of important fields using feature selection. With only 11 fields it is possible to apply the data mining algorithms directly, but we will use the feature selection option to rank the fields.
Create a new feature selection node and link to the data source node. The stream should now look like this:

By default, the feature selection (FS) options use the information-based measure method, and results are displayed. If this is not the case, change the settings so that they match the dialog below:

Run the KDD stream from the feature selection node to view the results.
The results from the feature selection node are shown in the following table. The feature score is a relative score between 0 and 1. The range of values is determined by the number of records in the database and the type of the fields. In our example, it shows that no one field is dramatically more powerful than the others, but it does show that the top 4 fields are much stronger than the other 6.
|
Fieldname |
Feature Score |
|
Satisfaction |
0.128071 |
|
DaysConsultancyLastYear |
0.10767 |
|
DaysConsultancyThisYear |
0.0906835 |
|
Training_Score |
0.0852104 |
|
Modules |
0.00865992 |
|
VisitSales |
0.00363713 |
|
VisitSupport |
0.00355532 |
|
Software |
0.00287392 |
|
Date |
0.00206705 |
|
UserGroup |
0.000249031 |
Close the feature selection results dialog when you have examined it.