skip to main content
Quick Start: Progress DataDirect for JDBC for Apache Hive Driver : Tuning for Performance
  

Try DataDirect Drivers Now

Tuning for Performance

The connection properties described in this section directly affect the performance of your driver. To tune for performance, configure your driver according to the recommended settings and your environment.

ArrayFetchSize

Purpose:
Determines the number of fields the driver retrieves from a server for a fetch. When executing a fetch, the driver divides the value specified by the number columns in a particular table to determine the number of rows to retrieve.
Performance Impact:
To improve throughput, increase the value of ArrayFetchSize. By increasing the value specified, you increase the number of rows the driver will retrieve from the server for a fetch. In turn, increasing the number of rows that the driver can retrieve reduces the number, and expense, of network round trips. Note that improved throughput does come at the expense of increased demands on memory and slower response time. Furthermore, if the fetch size exceeds the available buffer memory of the server, an out of memory error is returned when attempting to execute a fetch. If you receive this error, decrease the value specified until fetches are successfully executed.
Recommended Settings:
Tune this setting to reflect the typical fetch size of your application. Smaller fetch sizes can improve the initial response time of the query. Larger fetch sizes improve overall fetch times at the cost of additional memory.

BatchMechanism

Purpose:
Determines the mechanism that is used to execute batch operations.
Performance Impact:
Unlike the native batch mechanism, the multi-row insert mechanism only returns the total number of update counts for batch inserts. Therefore, setting BatchMechanism to MultiRowInsert offers substantial performance gains when performing batch inserts.
Recommended Settings:
If your application does not require individual update counts for each statement or parameter set in the batch, set to multiRowInsert for improved performance when executing batch inserts.

BinaryDescribeType

Purpose:
Specifies how columns of the Binary type are described.
Performance Impact:
When BinaryDescribeType is set to longvarbinary, the driver not only maps Binary to Longvarbinary, but also allocates more space to cache the long data. Because more space is allocated for the long data, your application will incur a performance penalty.
Recommended Settings:
If your application does not use the getClob() method, set to varbinary for improved performance.

CatalogMode

Purpose:
Determines whether the driver uses native catalog functions to retrieve information returned by DatabaseMetaData functions.
Performance Impact:
Apache Hive’s native catalog functions return incorrect information in certain scenarios. To address this issue, by default, the driver uses a combination of driver-discovered information and native functions to retrieve more accurate catalog information than native functions alone. While using driver-discovered information improves accuracy, it does so at an expense to performance. If accurate catalog information is not required, you can improve performance by setting CatalogMode connection property to native.
Recommended Settings:
If accurate catalog information is not required, set to native for the best performance.
If your application requires accurate catalog information, set to mixed for the optimal balance of performance and accuracy.

EnableCookieAuthentication

Purpose:
Determines whether the driver attempts to use cookie based authentication for requests to an HTTP endpoint after the initial authentication to the server.
Performance Impact:
Cookie based authentication improves response time by eliminating the need to re-authenticate with the server for each request.
Recommended Settings:
If your environment is configured to use cookies, set to true.

EncryptionMethod

Purpose:
Determines the method the driver uses to encrypt data sent between the driver and the database server.
Performance Impact:
Data encryption may adversely affect performance because of the additional overhead (mainly CPU usage) that is required to encrypt and decrypt data.
Recommended Settings:
If data encryption is not required, set to noEncryption for improved performance.

InsensitiveResultSetBufferSize

Purpose:
Determines the amount of memory that is used by the driver to cache insensitive result set data.
Performance Impact:
To improve performance when using scroll-insensitive result sets, the driver can cache the result set data in memory instead of writing it to disk. By default, the driver caches 2 MB of insensitive result set data in memory and writes any remaining result set data to disk. Performance can be improved by increasing the amount of memory used by the driver before writing data to disk or by forcing the driver to never write insensitive result set data to disk. The maximum cache size setting is 2 GB.
Recommended Settings:
Specify a value in KB that is a power of 2 for improved performance. This value should not exceed the amount available. The maximum value for this property is 2 GB.

MaxPooledStatements

Purpose:
Specifies the maximum number of prepared statements to be pooled for each connection and enables the driver’s internal prepared statement pooling when set to an integer greater than zero (0).
Performance Impact:
The driver’s internal prepared statement pooling provides performance benefits when the driver is not running from within an application server or another application that provides its own statement pooling.
Recommended Settings:
For better performance, specify a value that is greater than the number of prepared statements used by the application. Note that this performance benefit comes at the expense of greater memory consumption.

StringDescribeType

Purpose:
Specifies whether String columns are described as VARCHAR columns. This property affects ResultSetMetaData calls; it does not affect getTypeInfo() calls.
Performance Impact:
To obtain data from String columns with the getClob() method, the StringDescribeType connection property must be set to longvarchar. (Otherwise, calling getClob() results in an "unsupported data conversion" exception.) When StringDescribeType is set to longvarchar, the driver not only maps String to Longvarchar but also allocates more space to cache the long data. Because more space is allocated for the long data, your application will incur a performance penalty.
Recommended Settings:
If your application does not use the getClob() method, set to varchar for improved performance.

UseCurrentSchema

Purpose:
Specifies whether results are restricted to the tables and views in the current schema if a call is made without specifying a schema or if the schema is specified as the wildcard character %.
Performance Impact:
Restricting results to the tables and views in the current schema improves performance of calls that do not specify a schema.
Recommended Settings:
If your application needs to access tables and views owned only by the current user, performance of your application can be improved by setting this property to true.