Stats Node 

Statistics Node

 

You can use the statistics node as a rapid method of becoming familiarised with the data, and as a useful tool for identifying outliers and invalid data. When you run the statistics node, an HTML report summarises each field in terms of the field type, number of unique and missing values. For numeric fields, the minimum, maximum, mean and standard deviation are calculated.

 

The statistics displayed are:

 

Column Name

Description

Name

The name of the field.

Minimum

The minimum value in the field. Only valid values are shown and although when sorting a missing value is defined to be less than any other value, it is not included in this property. Not applicable for categorical fields.

Maximum

The maximum value in the field. Not applicable for categorical fields.

Mean

The arithmetic mean of all values in the field. Not applicable for categorical fields.

Std Dev

The standard deviation of values in the field. Not applicable for categorical fields.

Unique

The number of unique entries in the field. If the field has missing values this number will include the missing value as a unique entry and be denoted by an asterisk after the value.

Missing

The number of values missing from the field. If this number equals the number of records (i.e. the field contains no data) it should be removed immediately and not passed to later stages of the process.

Type

The type of data in the field. Possible values are:

  • Continuous numeric

  • Discrete numeric

  • Categorical

See the data source node page for more details on field types

 

 

Sample output from the statistics node:

 

Stats View Dialog