About: This article introduces the Merge node, a Compile node within Construct.
Location: Node panel
Table of Contents
- Feature Overview
- Configuring a Merge Node
- Disconnecting a Merge Node Without Losing Settings
- Video Tutorial
- Related Articles
Feature Overview
The Merge node is used to combine records from two or more related data sources together, matching tables horizontally by values that match in all sources. When configuring a Merge node, each data source entering the Merge is represented as a separate box in the working area, with the name of the source at the top and the list of its column names below. It is recommended to join no more than three data sources in a single Merge, as it can become complicated quickly.
Configuring a Merge Node
Joining Sources
The first step in configuring a Merge node is to join the data sources together. The user must consider how the data sets are related and which field(s) exist in both sources that could be used to match the records on.
To create a join between two columns, click and drag from the column in one data source and drop it onto the column of another data source. A line is drawn between the columns to represent the join. In the example shown, there is a single join between two data sources, matching on the Student ID (PIDM1) which is a number that uniquely identifies each record or student in the data, and exists in both sources. So records that contain a Student ID that exists in table 1 and in table 2 will be matched.
Note: The column names do not need to match exactly in both sources. The important piece is that the values within both columns will match.
If two columns will not make a connection to each other in the merge, it may be because the data types of the two variables are not the same. For example, if one variable is a type "real" and the other is type "integer", although the values could be compatible to be joined, the data types need to match as well. Data types can be changed using the Convert node prior to the Merge node to resolve this.
Join Types
The type of join that is created between tables can be changed. The default join is always a straight join or "Only matching records from both sources."
To change the join type, right-click on the line joining the two data sources. The line turns blue and the join type dialog box appears. Select the type from the list.
There are four Merge join types:
- Only matching records from both sources (straight join). An output record is created only if records are found in both sources with matching data in the joining column. By default, the initial join is always this of this type. It is depicted by a line with no arrowheads.
- All records from [data source 1] and matching records from [data source 2] (left join). One output record is created for each [data source 1] record paired with each matching record from [data source 2]. It is depicted by a line with an arrowhead pointing to [data source 2].
- All records from [data source 2] and matching records from [data source 1] (right join). One output record is created for each [data source 2] record paired with each matching record from [data source 1]. It is depicted by a line with an arrowhead pointing to [data source 1].
- All records from both sources (full join). Every unique combination of records from both sources will appear in the output.
Selecting Output Columns
Once all the joins are defined, the user must specify which columns from each source will appear in the output of the Merge. To select a column and add it to the Output Columns list, simply double click on the column name. There are two options for selecting columns:
- Select the asterisk at the top of the data source table. This adds all the columns in that data source to the Output Columns list.
- Individually select each column to be added, or hold down the control key and select multiple column names, then press the enter key.
When a column is selected, it appears in bold type within the data source table. It also appears in the Output Columns list on the right side of the screen, along with its source. Deleting a column from the Output Columns list removes the column from the Merge output. This is done by right-clicking on the entry in the Output Columns list and selecting Delete or selecting the entry and pushing the Delete Key.
Tip: Columns may be renamed after being added to the Output Columns list. Click on the name under Output Column and enter the new name. Columns can also be reordered using the black up and down arrows to rearrange the sequence.
Creating a Source ID Column
Users have the option to create additional column(s) for each data source being merged. The new columns will contain a flag (a binary 0 or 1 value) indicating if the record existed in the data source or not. Placing a check in the Created box beside the source creates the new column for that source.
Disconnecting a Merge Node Without Losing Settings
It is important to note that when a Merge node that has already been configured is disconnected from a process path, the configuration will be lost.
However, if a configured Merge needs to be disconnected, a dummy node can be used to prevent the loss of settings. Adding a Cache node directly before the Merge node and then deleting the connection between the Cache and the prior node will maintain the Merge configuration.
If you open the Merge node while it is disconnected, the configuration will appear in all red. This is signifying that the node can't find the required variables, however the configuration remains.
Important: Make sure to close out using the X button (NOT the Save and Exit check mark) and select Don't Save when prompted to avoid losing configuration settings.
Once the process path is reconnected and the Merge is passed the required fields, the original configuration will be present, and any dummy nodes can be deleted.
Video Tutorial
Comments
0 comments
Article is closed for comments.