skip to main content
Using DataDirect Hybrid Data Pipeline : Creating a Data Source definition : Apache Hadoop Hive On-Premise Parameters
  

Try Now

Apache Hadoop Hive On-Premise Parameters

ServerThe following tables describe parameters available on the tabs of an Apache Hadoop Hive On-Premise Data Source dialog:
*General tab
*Security tab
*OData tab
*Advanced tab

General tab

Click the thumbnail to view the screen. Required fields are marked with an asterisk.
General tab of the Apache Hadoop Hive data source setup dialog (on-premise)General tab of the Apache Hadoop Hive data source setup dialog (on-premise)
Table 14. General tab connection parameters for Apache Hadoop Hive
Field
Description
Connector ID
The unique identifier of the Hybrid Data Pipeline On-Premise Connector that is to be used to access the on-premise data source. Click the arrow Down arrow for a field) and select the Connector that you want to use. The identifier can be a descriptive name, the name of the machine where the Connector is installed, or the Connector ID for the Connector.
If you have not installed an On-Premise Connector, and no Connectors have been shared with you, this field and drop-down list are empty.
If you own multiple Connectors that have the same name, for example, Production, an identifier is appended to each Connector, for example, Production_dup0 and Production_dup1. If the Connectors in the drop-down list were shared with you, the owner's name is appended, for example, Production(owner1) and Production(owner2).
Data Source Name
A unique name for this Data Source definition.
Note: Names can contain only alphanumeric characters and underscores.
Database
The name of the database that is running on the database server.
Description
A description of this set of connection parameters.
Encryption Method
Determines whether data is encrypted and decrypted when transmitted over the network between the Hybrid Data Pipeline connectivity service and the on-premise database server.
Valid Values:
noEncryption | SSL
If set to noEncryption, data is not encrypted or decrypted.
If set to SSL, data is encrypted using SSL. If the database server does not support SSL, the connection fails and the Hybrid Data Pipeline connectivity service throws an exception.
*Connection hangs can occur when the Hybrid Data Pipeline connectivity service is configured for SSL and the database server does not support SSL. You might want to set a login timeout using the Login Timeout property to avoid problems when connecting to a server that does not support SSL.
The default value is noEncryption.
Port Number
The port number of the Apache Hive server to connect to.
Server Name
Specifies either the server name(if your network supports named servers) or the IP address of the primary Apache Hive server machine, for example, MyHiveServer or 122.23.15.12.
Password
A password for the Apache Hive account that is used to establish the connection to your Apache Hive server.
Note: By default, the password is encrypted.
By default, the characters in the Password field you type are not shown. If you want the password to be displayed in clear text, click the eye Watchful eye password button button. Click the button again to conceal the password.
User Id
The User Id for the Apache Hive account used to establish the connection to the Apache Hive server.

Security tab

Click the thumbnail to view the screen.
Security tab of the Apache Hadoop Hive data source setup dialogSecurity tab of the Apache Hadoop Hive data source setup dialog
Table 15. Security tab connection parameters for Apache Hadoop Hive
Field
Description
Encryption Method
Determines whether data is encrypted and decrypted when transmitted over the network between the Hybrid Data Pipeline connectivity service and the database server.
Valid Values:
noEncryption | SSL
If set to noEncryption, data is not encrypted or decrypted.
If set to SSL, data is encrypted using SSL. If the database server does not support SSL, the connection fails and the Hybrid Data Pipeline connectivity service throws an exception.
*Connection hangs can occur when the Hybrid Data Pipeline connectivity service is configured for SSL and the database server does not support SSL. You may want to set a login timeout using the Login Timeout parameter to avoid problems when connecting to a server that does not support SSL.
*When SSL is enabled, the following properties also apply:
HostNameInCertificate
ValidateServerCertificate
Crypto Protocol Version
Default: noEncryption
Crypto Protocol Version
Specifies a comma-separated list of the protocol versions that can be used in creating an SSL connection to the Data Source. If the specified protocol is not supported by the database server, the connection fails and the connectivity service returns an error.
Valid Values:
cryptographic_protocol [[, cryptographic_protocol ]...]
where:
cryptographic_protocol
is one of the following cryptographic protocols:
TLSv1 | TLSv1.1 | TLSv1.2
The client must send the highest version that it supports in the client hello.
Note: Good security practices recommend using TLSv1.2 if your data source supports that protocol version, due to known vulnerabilities in the earlier protocols.
Example
Your security environment specifies that you can use TLSv1.1 and TLSv1.2. When you enter the following values, the connectivity service sends TLSv1.2 to the server first:
TLSv1.1,TLSv1.2
Default: TLSv1, TLSv1.1, TLSv1.2
Host Name In Certificate
Specifies a host name for certificate validation when SSL encryption is enabled (Encryption Method=SSL) and validation is enabled (Validate Server Certificate=true). This optional parameter provides additional security against man-in-the-middle (MITM) attacks by ensuring that the server that the Hybrid Data Pipeline connectivity service is connecting to is the server that was requested.
Valid Values:
host_name | #SERVERNAME#
where host_name is a valid host name.
If host_name is specified, the Hybrid Data Pipeline connectivity service compares the specified host name to the DNSName value of the SubjectAlternativeName in the certificate. If a DNSName value does not exist in the SubjectAlternativeName or if the certificate does not have a SubjectAlternativeName, the connectivity service compares the host name with the Common Name (CN) part of the certificate’s Subject name. If the values do not match, the connection fails and the connectivity service throws an exception.
If #SERVERNAME# is specified, the Hybrid Data Pipeline connectivity service compares the server name that is specified in the connection URL or data source of the connection to the DNSName value of the SubjectAlternativeName in the certificate. If a DNSName value does not exist in the SubjectAlternativeName or if the certificate does not have a SubjectAlternativeName, the connectivity service compares the host name to the CN part of the certificate’s Subject name. If the values do not match, the connection fails and the connectivity service throws an exception. If multiple CN parts are present, the connectivity service validates the host name against each CN part. If any one validation succeeds, a connection is established.
Default: Empty string
Validate Server Certificate
Determines whether the Hybrid Data Pipeline connectivity service validates the certificate that is sent by the database server when SSL encryption is enabled (Encryption Method=SSL). When using SSL server authentication, any certificate that is sent by the server must be issued by a trusted Certificate Authority (CA). Allowing the connectivity service to trust any certificate that is returned from the server even if the issuer is not a trusted CA is useful in test environments because it eliminates the need to specify truststore information on each client in the test environment.
Valid Values:
true | false
If the check box is selected (true), the Hybrid Data Pipeline connectivity service validates the certificate that is sent by the database server. Any certificate from the server must be issued by a trusted CA in the truststore file.
If the check box is not selected (false), the Hybrid Data Pipeline connectivity service does not validate the certificate that is sent by the database server. The connectivity service ignores any truststore information that is specified by the Java system properties.
Default: true

OData tab

The following table describes the controls on the OData tab. For information on using the Configure Schema editor, see Enabling OData and working with Data Source groups. For information on formulating OData requests, see Formulating queries with OData version 2.
Click the thumbnail to view the screen. Required fields are marked with an asterisk.
OData tabOData tab
Table 16. OData tab connection parameters for Apache Hadoop Hive
Field
Description
OData Version
Enables you to choose from the supported OData versions. OData configuration made with one OData version will not work if you switch to a different OData version. If you want to maintain the data source with different OData versions, you must create different data sources for each of them.
OData Access URI
Specifies the base URI for the OData feed to access your data source, for example, https://hybridpipe.operations.com/api/odata/<DataSourceName>. You can copy the URI and paste it into your application's OData configuration.
The URI contains the case-insensitive name of the data source to connect to, and the query that you want to execute. This URI is the OData Service Root URI for the OData feed. The Service Document for the data source is returned by issuing a GET request to the data source's service root.
The OData Service Document returns the names of the entities exposed by the Data Source OData service. To get details such as the properties of the entities exposed, the data types for those properties and the relationships between entities, the Service Metadata Document can be fetched by adding /$metadata to the service root URI.
Schema Map
Enables OData support. If a schema map is not defined, the OData API cannot be used to access the data store using this Data Source definition. Use the Configure Schema editor to select the tables to expose through OData.
See Using the Configure Schema editor for more information.
Data Source Caching
Specifies whether the connection to the backend data source is cached in a session associated with the data source. Caching the back end connection improves performance when multiple OData queries are submitted to the same data source because the connection does not need to be created on every query.
Caching of the back end connection can get in the way when trying to configure a data source for OData. If a change is made to any of the Hybrid Data Pipeline data source connection parameters, those changes will not be seen because the connection was established using the old data source definition, and was cached. The session that caches the backend connection is discarded if there is no activity to the data source for approximately 5 minutes.
When you configure a data source for OData, it is recommended that the OData session caching be disabled. Once you are satisfied with the OData configuration for the data source, enable the parameter to get the performance improvement provided by caching the connection to the backend data source.
Valid Values:
When set to 1, session caching is enabled. This provides better performance for production.
When set to 0, session caching is disabled. Use this value when you are configuring the data source.
Default: 1
Page Size
Determines the number of entities returned on each page for paging controlled on the server side. On the client side, requests can use the $top and $skip parameters to control paging. In most cases, server side paging works well for large data sets. Client side pagination works best with a smaller data sets where it is not as expensive to fetch subsequent pages.
Valid Values: 0 | n
where n is an integer from 1 to 10000.
When set to 0, the server default of 2000 is used.
Default: 0
Refresh Result
Controls what happens when you fetch the first page of a cached result when using Client Side Paging. Skip must be omitted or set to 0. You can use the cached copy of that first page, or you can re-execute the query to get a new result, discarding the previously cached result. Re-executing the query is useful when the data being fetched may change between two requests for the first page. Using the cached result is useful if you are paging back and forth through results that are not expected to change.
Valid Values:
When set to 0, the OData service caches the first page of results.
When set to 1, the OData service re-executes the query.
Default: 1
Inline Count Mode
Specifies how the connectivity service satisfies requests that include the $count parameter when it is set to true (for OData version 4) or the $inlinecount parameter when it is set to allpages (for OData version 2). These requests require the connectivity service to include the total number of entities that are defined by the OData query request. The count must be included in the first page in server-driven paging and must be included in every page when using client-driven paging.
The optimal setting depends on the data store and the size of results. The OData service can run a separate query using the count(*) aggregate to get the count, before running the query used to generate the entities. In very large results, this approach can often lead to the first page being returned faster. Alternatively, the OData service can fetch the entire result before returning the first page. This approach works well for small results and for data stores that cannot optimize the count(*) aggregate; however, it may have a longer initial response time for the first page if the result is large.
Valid Values:
When set to 1, the connectivity service runs a separate count(*) aggregate query to get the count of entities before executing the query to return results. In very large results, this approach can often lead to the first page being returned faster.
When set to 2, the connectivity service fetches all entities before returning the first page. For small results, this approach is always faster. However, the initial response time for the first page may be longer if the result is large.
Default: 1
Top Mode
Indicates how requests typically use $top and $skip for client side pagination, allowing the service to better anticipate how to process queries.
Valid Values:
Set to 0 when the application generally uses $top to limit the size of the result and rarely attempts to get additional entities by combining $top and $skip.
Set to 1 when the application uses $top as part of client-driven paging and generally combines $top and $skip to page through the result.
Default: 0
OData Read Only
Controls whether write operations can be performed on the OData service. Write operations generate a 405 Method Not Allowed response if this option is enabled.
Existing OData-enabled data sources are read only (write operations are disabled). To enable write operations for an existing OData enabled data source, clear the OData Read Only option on the OData tab. Then, on the Data Sources tab, regenerate the OData model for the data source by clicking on the OData model icon Synch completed successfully.
Valid Values:
true | false
When the check box is selected (set to true), OData access is restricted to read-only mode.
When the check box is not selected (set to false), write operations can be performed on the OData service.
Default: false

Advanced tab

Click the thumbnail to view the screen.
Advanced tab of the Apache Hadoop Hive data source setup dialogAdvanced tab of the Apache Hadoop Hive data source setup dialog
Table 17. Advanced tab connection parameters for Apache Hadoop Hive
Field
Description
Extended Options
Specifies a semi-colon separated list of connection options and their values. Use this configuration option to set the value of undocumented connection options that are provided by Progress DataDirect technical support. You can include any valid connection option in the Extended Options string, for example:
Database=Server1;UndocumentedOption1=value[;UndocumentedOption2=value;]
If the Extended Options string contains option values that are also set in the setup dialog, the values of the options specified in the Extended Options string take precedence.
Valid Values:
string
Default: none
Initialization String
A semicolon delimited set of commands to be executed on the data store after Hybrid Data Pipeline has established and performed all initialization for the connection. If the execution of a SQL command fails, the connection attempt also fails and Hybrid Data Pipeline returns an error indicating which SQL commands failed.
Syntax:
command[[; command]...]
Where:
command
is a SQL command. Multiple commands must be separated by semicolons. In addition, if this property is specified in a connection URL, the entire value must be enclosed in parentheses when multiple commands are specified. For example, assuming a schema name of SFORCE:
InitializationString=(REFRESH SCHEMA SFORCE)
The default is an empty string.
Login Timeout
The amount of time, in seconds, that the Hybrid Data Pipeline connectivity service waits for a connection to be established before timing out the connection request.
Valid Values:
0 | x
where x is a positive integer that represents a number of seconds.
If set to 0, the connectivity service does not time out a connection request.
If set to x, the connectivity service waits for the specified number of seconds before returning control to the application and throwing a timeout exception.
Default: 30
Max Pooled Statements
The maximum number of prepared statements to cache for this connection. If the value of this property is set to 20, the connectivity service caches the last 20 prepared statements that are created by the application.
Query Timeout
The number of seconds for the default query timeout for all statements that are created by a connection.
Valid Values:
-1 | 0 | x
If set to -1, the query timeout functionality is disabled. The Hybrid Data Pipeline connectivity service silently ignores calls to the Statement.setQueryTimeout() method.
If set to 0, the default query timeout is infinite (the query does not time out).
If set to x, the Hybrid Data Pipeline connectivity service uses the value as the default timeout for any statement that is created by the connection. To override the default timeout value set by this connection option, call the Statement.setQueryTimeout() method to set a timeout value for a particular statement.
*If x is specified, this property is ignored for HiveServer1 connections.
Default: 0
See the steps for:
Creating a Data Source definition