KDD Methodology
KDD stands for Knowledge Discovery in Databases. It is the concept of
extracting previously unknown and potentially useful information from
large sets of data.
The general capabilities of the system can be summarised by the
stages of the KDD Roadmap1. The outline of this process is
illustrated below.

-
- PROBLEM SPECIFICATION
- Much of work required during this stage is related
to setting up the data mining project and knowing what information is
available and what must be obtained. The stage introduces the concept
of a log which records the decisions made at each stage of the KDD
process; this log is implemented in the toolkit to describe the
options and results of applying each node in the stream. Tools to
generate summary statistics and visualise the database are available
to aid familiarization.
- RESOURCING
- Resourcing is the stage that gathers all
information, materials, equipment and personnel required to carry out
the KDD project.
- DATA CLEANSING
- The package provides a number of tools to help
cleanse the format of the database. These are processes that will
normally be applied only once to the database, unlike the pre-processing
stages where parameter setting becomes important and the process is
iterative. Such cleansing techniques are record sampling (random, 1
in n) and working with missing data.
- PRE-PROCESSING
- Pre-processing techniques are very powerful
functions that can dramatically change the shape and format of the
database. Divided into two main areas, feature
selection removes data by selecting only the most powerfully
predictive fields (field sampling); and discretisation
converts numeric continuous data to numeric discrete (one form of
value sampling).
- DATA MINING
- The techniques that WITNESS Miner uses are suitable
for solving most different types of data mining problems (for
example: classification, clustering and so on). However, it can be
particularly powerful to have the ability to target problems where
rules are required to predict records for a particular class. WITNESS
Miner uses a simulated annealing data mining engine to generate rules
describing these classes of interest.
- EVALUATION
- Processes to build and evaluate rules are included
as part of a collection of evaluation tools. The rules can be
evaluated against new databases for testing purposes whilst the
components of the rule can be 'tweaked' to determine the sensitivity
and interest of the rule. Visualisation methods such as scatter plots
and histograms are also available.
-
J.C.W. Debuse, B. de la Iglesia, C.M. Howard and V.J. Rayward-Smith.
"A methodology for Knowledge Discovery: The KDD Roadmap.
University of East Anglia, School of Information Systems
Technical Report SYS-C99-01.