Toolbar buttons

 

Standard toolbar

New Project 

New Project
Creates a new, blank project. Any existing streams in the project workspace and the project tree will be cleared; the log window will also be cleared.

Open Project 

Open Project
Opens an existing project from disk. The last four projects that have been open are listed on the file menu for easy access. If a project is already open, the user will be prompted to save the current project before the new project is opened.

Save Project 

Save Project
Saves the current project to disk.

 

Control toolbar

Normal Mode 

Normal Mode
This mode allows you to perform the following actions in the main workspace:

  • left click on a node and drag: Moves the node around the screen along with all attached links.

  • left double click on node: Opens the edit options for that node. No action is performed if edit options are not available for the current node.

  • right click on node: Displays the context menu, which enables you to perform certain actions on that node (for example: editing options, renaming, deleting and running. Some options might not be available for the node.

As well as selecting normal mode from the toolbar, you can also select normal mode from the mode menu, or by pressing the F2 key.

Link Mode 

Link Mode
This mode allows you to link two nodes together.

WITNESS Miner displays the symbol next to the cursor, indicating that you must click on the node that is the first stage in the link. Once you have clicked on the first node, WITNESS Miner displays a symbol next to the cursor, indicating that you must click on the node that is the second stage in the link.

When you have clicked on the second node, a line appears between the two nodes (indicating the flow of data) and WITNESS Miner returns to normal mode.

As well as selecting the link mode from the toolbar, you can also select link mode from the mode menu or by pressing the F3 key.

Unlink Mode 

Unlink Mode
This mode allows two linked nodes to be unlinked.

WITNESS Miner displays the symbol next to the cursor, indicating that you must click on the node that is the first node to be unlinked. Once you have clicked on the first node, WITNESS Miner displays a symbol next to the cursor, indicating that you must click on the node that is the second node to be uninked.

When you have clicked on the second node, the line linking the nodes disappears and WITNESS Miner returns to normal mode.

As well as selecting the unlink mode from the toolbar, you can also select unlink mode from the mode menu or by pressing the ALT + F3 keys.

Delete Mode 

Delete Mode
Delete mode removes a node from the project.

WITNESS Miner displays a symbol next to the cursor, indicating that you must select a node for deletion. If the node is the last node in the stream, WITNESS Miner simply deletes the node. If the node is not the last node in the stream, WITNESS Miner displays the message: do you wish to delete the stream from this point forward? Click on OK to confirm.

When you have deleted the node (or node and subsequent stream) WITNESS Miner returns to normal mode.

As well as selecting the delete mode from the toolbar, you can also select delete mode from the mode menu or by pressing the F4 key.

 

Problem Specification toolbar

Data Source Node 

Data Source Node
The data source node is used to load databass from text files on disk into the working stream of the project. The node is a source node, that is, it does not accept any input links but allows multiple output links.

See the data source node page for more details on this node.

 

 

Data Cleansing toolbar

Reorder Node 

Reorder Node
Allows the user to change the order of the fields within the database.

See the reorder node page for more details on this node.

Select Node 

Select Node
The select node can be used to manually filter fields out of the current database. If fields are not required for a particular activity, or a subset of the database is to be exported, the select node can be used to accomplish this.

See the select node page for more details on this node.

Sort Node 

Sort Node
The sort node is used to sort the entire database based on the entries in one of the fields. The values in the selected field can be arranged into ascending or descending order; data in all other fields will be re-ordered accordingly to maintain the integrity of the records structure.

See the sort node page for more details on this node.

Replace Node 

Replace Node
Replace all occurrences of the search token with the new 'replace' token. The search can be applied to all fields or a selected subset of fields.

See the replace node page for more details on this node.

Balancing Node 

Balancing Node
This node allows the size of the database to be modified in terms of the number of actual records in the database and the percentage number of records in each output class. These changes are achieved by either duplicating or deleting existing records as appropriate to achieve the specified values.

See the balancing node page for more details on this node.

Record Sampling Node 

Record Sampling Node
Provides three method of sampling the records from the database. Sampling is commonly used to build the training and testing sets or to select a subset of the records for large databases. Three methods are provided: selecting the first n records; selecting 1in n records; and selecting n records at random.
For each method either the training set (the sampled records) or the testing set (the records remaining after sampling) can be chosen as the output for the node. The random selection method makes use of a built in random number generator (RNG). The RNG can be initialised by a seed value which the user can change to build different subsets of records; given the same starting seed the RNG will always select the same records so that experiments can be fairly compared and repated.

See the record sampling node page for more details on this node.

 

Pre-Processing toolbar

Discretise Node 

Discretise Node
The discretise node provides three methods of grouping together similar values within continuous numeric fields.

See the discretise node page for more details on this node.

Feature Selection Node 

Feature Selection Node
Calculate feature selection scores for each input field in the database based on its predictive power against the target field. An information-based measure is used for databases with a categorical or discrete target field and correlation coefficients are used when the target field is numeric.
The node can be used as a through node (input-output) for selecting the best fields and passing the results to the remainder of the stream. Alternatively, the node can be used as an output only node where the feature scores are calculated and displayed for information purposes only.

See the feature selection node page for more details on this node.

 

Data Mining toolbar

Discovery Node 

Discovery Node
The discovery node represents the powerful core data mining alorithm available in this package. The node uses a simuated annealing (SA) algorithm to search for rules describing a particular class of records in the database.

See the discovery node page for more details on this node.

Clustering Node 

Clustering Node
The clustering node uses an intelligent algorithm to divide the database into a specified number of clusters (groups) of similar records. Once the algorithm has run, the centroid of each cluster can be displayed in a HTML report. A centroid represents what the average record in a cluster would look like. The average distance of each record from its centroid is also displayed.

See the clustering node page for more details on this node.

Decision Tree Node 

Decision Tree Node
The decision tree node is used to train, test and view decision trees that can be used for complete classification tasks. Unlike the discovery node which targets a particular class of records, a decision tree will try to build rules to classify all records.

See the decision tree node page for more details on this node.

 

Evaluation toolbar

View Node 

View Node
Displays the data in a window similar to that found in a spreadsheet package.

See the view node page for more details on this node.

Stats Node 

Statistics Node
A collection of summary statistics are calculated and displayed for each field in the database. The results are presented in a spreadsheet and can be copied to the clipboard or exported in text or HTML format.

See the statistics node page for more details on this node.

Rule Node 

Rule Node
The rule node is used to build, import or export a set of one or more rules in conjunctive form. The rules can be evaluated against the database and the results displayed on screen, or used to filter records meeting a set of conditions. An example use of the latter option could be to filter out all records which are classified by a particular rule so that the simulated annealing algorithm can be reapplied to the remaining records.

See the rule node page for more details on this node.

Distribution Node 

Distribution Node
Displays data from a selected field in a vertical bar chart (distribution graph). The distribution node is most useful for investigating the frequency and ratio of values within a field. If the field is categorical, one bar is used to represent each unique value.

See the distribution node page for more details on this node.

Graph Node 

Graph Node
The graph node can be used to plot data from selected field on either 2D or 3D scatter plots. For a 2D plot, two series (fields) must be selected (X and Y), with an extra series required for 3D plots (Z series).

See the graph node page for more details on this node.

Link Analysis Node 

Link Analysis Node

See the link analysis node page for more details on this node.

 

Exploitation toolbar

Export Node 

Export Node
The export node saves the current state of the database to disk in comma delimited (CSV) format.

See the export node page for more details on this node.

Persist Node 

Persist Node
The persist node allows the current state of the database to be stored in memory (providing sufficient is available) so that subsequent runs of the stream will be more efficient.

See the persist node page for more details on this node.