Clustering Node 

Clustering Node

 

The clustering node performs the k-means algorithm, which is an unsupervised learning method compared to the supervised learning methods in the discovery and decision tree nodes.

The k-means algorithm attempts to cluster the database records into a predefined number of groups so that each group of records minimizes a distance metric over a series of iterations.

As with the feature selection node, the clustering node can either create a new field containing the cluster identifier, or display the results in an HTML report. The report displays summary statistics about each field within each cluster as well as identifying the most typical and least typical record in each cluster.

A centroid represents what the average record in a cluster would look like. The distance of each record from its centroid is calculated based on the values in each field of that record. The level of importance of each field in this calculation can be individually set by altering the weighting for that field. To remove a field entirely from this calculation the weighting can be set to zero. The algorithm can also be set to perform clustering based solely on the values in the output field.

This node creates a new field in the database that indicates which cluster the algorithm placed each record in. A rule node placed after the clustering node can be used to select records based upon this field allowing specific clusters to be analysed independently.

 

Options

Full details of the options available for the clustering node can be found on the clustering options page.