About: This article introduces the Reduce node, a Special node within Construct.
Location: Node panel
Table of Contents
Feature Overview
The Reduce node prepares a dataset to be used for predictive modeling in Predict. This node identifies columns that statistically relate to the defined Y Variable.
Y Variable Field
The Y Variable represents the outcome that a predictive model is intended to predict. This can be any linear (real or integer) or binary (1 or 0) variable in the dataset. The right facing arrow next to the Y Variable field can be used to designate this variable.
Test Options
The Related p-value can be adjusted. The p-value determines the minimum confidence level in a variable's relatedness to the outcome. A larger p-value means a lower confidence level in a specific variable. The default p-value is 0.01.
Identifying Related Fields
Selecting the Find Related Now button performs a series of statistical tests (Chi-Square, F-Test, etc.) on the data within each variable. In addition to the variable, several mathematical transformations (such as square, square root, cube, cube root, log10 and natural log.) of the variables are also evaluated for relatedness.
Variables that are found to be sufficiently related to the Y Variable are colored green. Variables found to be related but not meeting the p-value threshold will be colored red. And those not related will be colored black. The statistical test result is displayed next to each variable as well as which (if any) transformation was used.
Output Options
Keep Column - This column allows users to keep or remove variables from the output. Columns marked with a square will be kept if relevant and left behind if not. This setting may be manually overridden to always keep (checked) or always discard (unchecked).
Create Related Best Fit Transforms - Selecting this option directs Construct to create new variables containing the mathematical transformations that were related.
Create Related Category Transforms - Selecting this option will direct Construct to create new variables containing the binary version of categorical variables that were related.
Comments
0 comments
Article is closed for comments.