Quick Start: Progress DataDirect® for ODBC for

Apache Spark SQLTM Wire Protocol Driver on

UNIX and Linux (32-bit and 64-bit)

The following basic information enables you to connect with and test your driver immediately after installation. For installation instructions, see the Progress DataDirect for ODBC Drivers Installation Guide. This Quick Start covers the following topics:

Environment Setup

Test Loading the Driver

Connecting to a Database

Testing the Connection

Tuning the Driver for Optimal Performance

ODBCHOME in the following sections refers to your installation directory path determined at installation.

Environment Setup

1       Check your permissions: Log in as a user with full r/w/x permissions recursively on the entire Progress DataDirect for ODBC Drivers installation directory.

2       Determine which shell you are running: From the login shell, execute the echo $SHELL command.

3       Run the DataDirect setup script to set variables: Two scripts, odbc.csh and odbc.sh, are installed in the installation directory. For Korn, Bourne, and equivalent shells, execute odbc.sh. For a C shell, execute odbc.csh. After running the setup script, execute the env command to verify that the ODBCHOME/lib directory has been added to your shared library path.

4       Set ODBCINI variable: Progress DataDirect for ODBC Drivers install a default odbc.ini file, where your data sources reside, in the installation directory. You must set the ODBCINI environment variable to point to the path of the odbc.ini file. For example:

$ ODBCINI= ODBCHOME/odbc.ini; export ODBCINI

Test Loading the Driver

The ivtestlib (32-bit drivers) and ddtestlib (64-bit drivers) test loading tools are provided to verify that the driver can be loaded into memory; they are located in the ODBCHOME/bin directory. For example, to load the 32-bit driver, you would enter:

$ ivtestlib ODBCHOME/lib/ivsparknn.zz

where nn represents the driver level number and zz represents the extension.

If the load is successful, the test loading tool returns a success message along with the version string of the driver. If the driver cannot be loaded, test loading tool returns an error message explaining why.

Connecting to a Database

The following procedure is applicable to all UNIX/Linux systems.

Defining a Data Source in the odbc.ini

The default odbc.ini file installed in the installation directory is a template in which you create data source definitions. You enter your site-specific database connection information using a text editor. Each data source definition must include the keyword Driver=, which is the full path to the driver.

The following examples show the minimum connection string options that must be set to complete a test connection, where xx represents iv for 32-bit or dd for 64-bit drivers, nn represents the driver level number, and zz represents the extension. The values for the options are samples and are not necessarily the ones you would use.

To configure a connection, you might enter:

[Apache Spark SQL Wire Protocol]



Connection Options

Database: The name of the database to which you want to connect by default.

HostName: Either the URL name or the IP address of the server to which you want to connect.

PortNumber: The port number of the server listener. The default is 10000.

MaxStringSize: (This option is required only for Microsoft Access and Tableau users.) The maximum size of columns of the String data type that the driver describes through result set descriptions and catalog functions. Specify the following value that corresponds to your environment:

        For Microsoft Access users, specify a value of 255.

        For Tableau users, specify a value from 255 to 4000 that suits your environment.

Testing the Connection

The Progress DataDirect for ODBC for Apache Spark SQL Wire Protocol driver installation includes a program named example that can be used to connect to a data source and execute SQL. The application is located in the ODBCHOME/example directory. To run the program, type example and follow the prompts to enter your data source name, user name, and password. If successful, a SQL> prompt appears and you can type in SQL statements such as

SELECT * FROM accounts

If example is unable to connect, the appropriate error message appears.

Tuning the Driver for Optimal Performance

The drivers have connection options that directly affect performance. To tune these drivers for performance, set the following connection options:

If you know the typical fetch size for your application

Controlled by ArraySize. This option specifies the number of cells the driver retrieves. By increasing the value of Array Size, you increase the number of rows the driver will retrieve from the server for a fetch. Smaller values can improve the initial response time of the query. Larger values improve overall fetch times at the cost of additional memory and slower response time.

Note: If the fetch size exceeds the available buffer memory of the server, an out of memory is returned when attempting to execute the fetch. If you receive this error, decrease the value specified until fetches are successfully executed.

If you know the mazimum size of data when fetching images, pictures, long text, or binary data

Controlled by DefaultLongDataBuffLen. This option specifies the maximum length of data (in KB) the driver can fetch from long columns in a single round trip and the maximum length of data that the driver can send using the SQL_DATA_AT_EXEC parameter. The buffer size should only be large enough to accommodate the maximum amount of data retrieved; otherwise, performance is reduced by transferring large amounts of data into an oversized buffer.

If your application requires encryption of data

Controlled by EncryptionMethod. Data encryption may adversely affect performance because of the additional overhead (mainly CPU usage) required to encrypt and decrypt data.

If your application needs to access a database objects owned only by the current user

Set UseCurrentSchema=1 to improve performance. When this option is enabled, the driver returns only database objects owned by the current user when executing catalog functions. Calls to catalog functions are optimized by grouping queries.

© 2016. Progress Software Corporation. All rights reserved.

2/16, 8.0.2