|
|
Statistics Node |
You can use the statistics node as a rapid method of becoming familiarised with the data, and as a useful tool for identifying outliers and invalid data. When you run the statistics node, an HTML report summarises each field in terms of the field type, number of unique and missing values. For numeric fields, the minimum, maximum, mean and standard deviation are calculated.
The statistics displayed are:
|
Column Name |
Description |
|
Name |
The name of the field. |
|
Minimum |
The minimum value in the field. Only valid values are shown and although when sorting a missing value is defined to be less than any other value, it is not included in this property. Not applicable for categorical fields. |
|
Maximum |
The maximum value in the field. Not applicable for categorical fields. |
|
Mean |
The arithmetic mean of all values in the field. Not applicable for categorical fields. |
|
Std Dev |
The standard deviation of values in the field. Not applicable for categorical fields. |
|
Unique |
The number of unique entries in the field. If the field has missing values this number will include the missing value as a unique entry and be denoted by an asterisk after the value. |
|
Missing |
The number of values missing from the field. If this number equals the number of records (i.e. the field contains no data) it should be removed immediately and not passed to later stages of the process. |
|
Type |
The type of data in the field. Possible values are:
See the data source node page for more details on field types |
Sample output from the statistics node: