Key features of WITNESS Miner include:
Discretisation
There are three automated discretisation methods in WITNESS Miner. There is also the ability to discretise data manually, which allows you you to create your own partitions and apply the same discretisation to both training and testing databases.
For example, a financial company might store the age of their clients in a database. When using this data in a mining exercise, the company might wish to band the ages so that they can use rules such as:
If AGE=BAND_1 where BAND_1 is between 20 and 30.
This is useful because you can't use numeric values to generate both conditions at the same time; you would need separate conditions such as age<30 or age>20.
Feature Subset Selection algorithms
These algorithms remove all the fields that are not highly predictive. This process can be useful for reducing the size of the database and discarding data that is not relevant to your investigation. Conversely, it can provide subsets of data that could be used in separate investigations.
There may be many hundreds of fields in a database that relate to customer profile information. Although you could use all of those fields in the mining algorithm, it is often better to use a reduced number of fields to improve the discovery algorithms' performance.
For example, you could use Feature Subset Selection to find the best 100 fields from a database of 500, then allow the discovery algorithm to identify which of those 100 fields would be useful in a rule.
Rules
Rule induction (using modern heuristic techniques) arrives at a solution through the evaluation of past experience. This is useful in several ways:
It allows you to target a class of interest ("nugget discovery") as opposed to building rules for all classes. This approach is particularly useful for describing rare events such as responses to mailing lists. The algorithm can focus on describing the group of people who responded (typically fewer than 5%) rather than on the 95% who did not respond.
It controls the complexity of the rules generated by limiting the number of conditions.
It introduces a bias towards accurate rules (for critical decisions) or rules that favour coverage.
Handling missing values
All algorithms have been written so that they deal with missing values. This means that you don't have to worry about discarding data or estimating missing values. In some cases, missing values might exist in many fields; discarding affected fields and records would leave very little information in the database. Equally, estimating many values could add potentially dangerous an incorrect information to the database.
A wide variety of standard data processing tools.
WITNESS Miner includes basic, frequently-used data processing tools, which reduce the need to keep returning to the database.
Unique algorithms
The main unique (or novel) algorithm is the simulated annealing approach to rule discovery. Although the Feature Selection is an algorithm that is widely used, WITNESS Miner uses it in a unique way.
Customisable HTML reports
You can save WITNESS Miner reports in HTML format and customise them, ready for printing or adding to your intranet.
Exporting rules and decision tree models to XML
You can exploit the knowledge that you have gained from the data by exporting rules and decision tree models to a standard XML format. You can then use the exported models in external applications.
User-friendly interface
WITNESS Miner is easy to use and you can view processes very clearly because streams are represented graphically, as well as in the project tree. All the commonly-used commands are available by clicking on buttons on toolbars.