Workspace Tab in Predict

About: This article describes the Workspace tab in Predict.
Purpose: Use the Workspace tab to organize and manage data projects and data source connections.

Introduction
Open a File
Load Saved Analysis
Score a File
Recent Analyses
Recent Connections
- Load Data From Source
Connections
Related Articles

Introduction

The Workspace tab automatically opens when the Predict program first starts. It serves as the place where users manage data source connections and prior data analyses, as well as where new data analyses are started.

Open a File

To start a new analysis, use the Open a File button to browse to the data that the analysis will be built on. This button launches a connection window, allowing users to connect to the location where their data lives. As soon as a data file is located and saved, the Analysis tab will open.

Load Saved Analysis

To re-open an analysis that has previously been worked on, use the Load Saved Analysis button. This allows users to browse on their computer for any .vpa (Predict Analysis) files. This could be a previously saved project, or an analysis shared by a colleague. Once a .vpa file is located and selected, the Analysis tab will open.

Score a File

The Score a File button offers users an internal way to apply models they have built in Predict to a dataset for scoring. Selecting this button launches a connection window, allowing users to select a data file that they wish to apply a predictive model to. Once a file has been located and saved, the Score tab will open.

Recent Analyses

The last ten analyses saved by the user will appear in the Recent Analyses frame. To open an analysis present in the Recent Analyses window, simply double-click the name of analysis. The data file used to create the original analysis must be in its original location, must be named the same, and must contain the same variables. The Analysis tab will then open containing the dataset and any previous work.

Recent Connections

The Recent Connections frame lists the last ten unique data sources accessed by the program. When an entry in the list is selected, two icons appear on either side of the listing:

The pushpin on the left fixes that entry in place. It will not be displaced by other entries until "unpinned."
The triangle on the right is a Load button which will begin the process of loading the data into the program and starting a new analysis.

Load Data From Source

When a recent connection is "loaded" using the Load button to the right of the connection name, the Load Data From Source window launches, offering different configuration settings before data is loaded into an analysis.

Load data from Source window

Percent (%) of data to load - This control allows users the option of analyzing just a subset (1 to 100%) of the records in the dataset. If the Randomly order rows option is selected, the records to analyze will be selected randomly, otherwise they will selected in the order encountered in the file.
Randomly order rows - If the Randomly order rows option is selected, and the % of data to load is set to something other than 100%, the records to analyze will be selected randomly. Otherwise the will selected in the order encountered in the file.
Max category count - A user-defined value capping the number of unique categories a variable may contain and still be considered a categorical variable. If this number is exceeded, the variable is reclassified as a text variable (if one or more of the values contain letters) or a numeric variable (if all the values consist exclusively of numbers).
Category Min Occurrence - In order for a a category to be considered for individual analysis, it must appear in the dataset a certain number of times. This frequency can be set in the Category Min Occurrence field.
Excluded Columns - Variables that the user has determined should not be part of the analysis can be selected from the list on the left and “moved” to this section using the Exclude? check box next to each variable. Variables may be returned to the list of loadable columns by un-selecting the Exclude? check box.
Identifier Columns - Variables that serve as unique record identifiers in the dataset should not be part of the analysis. To keep them from loading, use the Is Identifier? check box to "move" the variable to the Identifier Columns section. Variables may be returned to the list of loadable columns by un-selecting the Is Identifier? check box.
Load Data - Once the data configuration is established, the Load Data button will automatically open the Analysis tab. Any configuration options will be applied to the records before the analysis begins.

Connections

Connections section of workspace tab

Connections are not required to load a data file into Predict for an analysis. However, creating a connection to a data source that will be used more than once eliminates future steps. A connection points to a location where data is stored on the local computer, a network, or a cloud-based resource. It can reference either a file folder or a database on a server. Once created, a connection may be used over and over again by any number of different jobs. The Connections section of the Workspace tab is where connections used by the product appear and can be managed (created, modified or deleted).

Data Connection Source Types

Connections are automatically grouped by data source type. Construct supports most data sources that are accompanied by an ODBC or API connection. Which data source types are available in the program is often dependent upon whether the ODBC drivers for that type are installed on the computer. ODBC drivers are freely available for download from the manufacturer. Common connection types include:

Access
DSN (ODBC Connections)
Excel
MySQL
Oracle
PostgresSQL
SalesForce
SAS
SPSS
SQL Server
Text/CSV
XML

Table of Contents