SampleRecs Node 

Record Sampling Node

 

Most KDD projects search for patterns in a training set of records and then perform an evaluation on a testing set of records. This process tests the stability of the rules on previously unseen records to measure their predictive power. If you only have a single database available (or if you wish to take a subset of a large database), you can use the sample node to divide the database into training and testing sets.

Three methods are available:

For each method, you can choose to output either the training set (the sampled records) or the testing set (the records remaining after sampling) from the node.

The random selection method makes use of a built-in random number generator (RNG). You can initialize the RNG by using a seed value, which you can change in order to build different subsets of records; given the same starting seed the RNG will always select the same records so that you can repeat experiments and compare them fairly.

 

Options

Full details of the options available for the record sampling node can be found on the record sampling options page.