Using Hybrid Data Pipeline : Creating and using REST data sources : Creating an input REST file

Creating an input REST file

The input REST file is a JSON file which specifies one or more REST endpoints in the form of a JSON object. The input REST file may include only endpoints, or it can include endpoints with parameters that define the REST data. When initially connecting to a REST endpoint, Hybrid Data Pipeline uses the input REST file to build a relational model of the REST data. You can create an input REST file with a text editor. Once you create the input REST file, it can be uploaded via the Web UI or with the Drive Files API.

The basic format of the input REST file consists of a list of comma-separated endpoints. The following example shows how endpoints are mapped as tables to support a relational schema.

{
"<table_name1>":"<endpoint1>",
"<table_name2>":"<endpoint2>",
"<table_name3>":"<endpoint3>"
}

Note: The syntax requirements described here can also be applied to editing the relational model of your REST data through the Web UI. It should also be noted that the Entity Name field in the Web UI specifies the name of the relational table.

Valid formats for the input REST file are described in detail in the following sections.

Specifying Endpoints for GET Request with Unparameterized Paths

Specifying Endpoints for GET Request with Parameterized Paths

Specifying Endpoints for GET Requests with Query Parameters

Specifying Endpoints for Requests with Custom HTTP Headers

Defining a POST Request

Configuring Paging

Specifying Endpoints for GET Request with Unparameterized Paths

To specify endpoints for unparameterized GET requests, use the following format:

"<table_name>":"<host_name>/<endpoint_path>"

table_name

is the name of the relational table to which the driver maps the endpoint. For example, country.

host_name

(optional) is the protocol and host name components of the URL endpoint. For example, http://example.com. You can omit this value by specifying the host name using the ServerName property.

endpoint_path

is the path component of the URL endpoint. For example, countries.

For example, the following demonstrates a GET request that will map to the countries table.

"countries":"http://example.com/countries/"

Specifying Endpoints for GET Request with Parameterized Paths

To specify parameterized GET requests, use the following format:

"<table_name>":"<host_name>/<endpoint_path1>/{<param_name>:<param_value>}[/<endpoint_path2>]"

table_name

is the name of the relational table to which the driver maps the endpoint. For example, states.

host_name

(optional) is the protocol and host name components of the URL endpoint. For example, http://example.com. You can omit this value by specifying the host name using the ServerName property.

endpoint_path

is the path component of the URL endpoint. For example, states.

param_name

is the parameter identifier used for filtering the request. For example, countryCode.

param_value

is the parameter value used for filtering the request during sampling. For example, USA.

For example, the following demonstrates a GET request that will map to the states table.

"states":"http://example.com/states/get/{countryCode:USA}/all"

Specifying Endpoints for GET Requests with Query Parameters

Use the following format to specify endpoints for GET requests with argument parameters. Multiple argument parameters withing the same endpoint are separated by an ampersand (&).

"<table_name>":"<host_name>/<endpoint_path>?<parameter>=<value>[&...]"

table_name

is the name of the relational table to which the driver maps the endpoint. For example, timeseries.

host_name

(optional) is the protocol and host name components of the URL endpoint. For example, http://example.com. You can omit this value by specifying the host name using the ServerName property.

endpoint_path

is the path component of the URL endpoint. For example, times.

parameter

is the argument parameter component of the parameter=value pair used for filtering the request. For example, interval.

value

is the value argument parameter used for filtering the request. For example, 5min.

For example, the following demonstrates a GET request that will map to the timeseries table.

"timeseries":"https://www.example.com/times/query?interval=5min&symbol=USA&function=TIME_SERIES_WEEKLY"

Specifying Endpoints for Requests with Custom HTTP Headers

Some endpoints employ custom HTTP headers to filter data returned by a GET request. This type of filtering is typically used to create multiple unique reports/tables from the same endpoint. To use custom headers, you must define the request in the input REST file. The REST file entry is comprised of a path and header object. The path object contains the URL endpoint used in requests, while the header object defines the headers and provides value arguments used to filter the request.

In addition to filtering requests, the header object can be used to specify a value for the Accept header if the default, application/json, is not accepted by the endpoint. This scenario typically occurs when accessing a vendor endpoint that uses a proprietary Accept header.

An entry for a GET request using custom HTTP headers takes the following form:

"table_name":{
"#path": "<host_name>/<endpoint_path>",
"#headers":{
"<header1>":"<value1>",
"<header2>":"<value2>",
"<header3>":"<value3>"
}
}

table_name

is the name of the relational table to which the driver maps the endpoint. For example, people.

host_name

(optional) is the protocol and host name components of the URL endpoint. For example, http://example.com. You can omit this value by specifying the host name using the ServerName property.

endpoint_path

is the path component of the URL endpoint. For example, times.

header

is the HTTP header component of the header=value pair used for filtering the request. For example, X-Subway-Payment.

When overriding the Accept header, this value is Accept.

value

is the value argument for the HTTP header used for filtering the request or, if overriding the default Accept header, the value of the Accept header for the endpoint. For example, token.

For example, the following demonstrates an entry for a GET request that defines custom HTTP headers.

"people":{
"#path": "http://example.com/people",
"#headers":{
"Accept":"application/calendar+json",
"X-Subway-Payment":"token",
"X-Laundry-Service":"dryclean",
"X-Favorite-Food":"pizza"
}
}

Defining a POST Request

To use POST requests, you must define the request in the REST file in the JSON format. The definition entry is comprised of a path and body. The path contains the URL endpoint and the body used in requests, while the body defines documents and provides sample values. The driver then uses these sample values to define which data type to be used when executing a POST request. An entry for a POST request takes the following form:

"table_name": {
"#path": "<host_name>/<endpoint_path>",
"#post": {
"<field1>":"<value1>",
"<field2>":"<value2>",
}
}

table_name

is the name of the relational table to which the driver maps the endpoint. For example, countries2.

host_name

(optional) is the protocol and host name components of the URL endpoint. For example, http://example.com. You can omit this value by specifying the host name using the ServerName property.

endpoint_path

is the path component of the URL endpoint. For example, country.

document

is the document name of the document=value pair. For example, START_DATE.

value

is the sample value the driver uses to determine the data type to use when executing a POST to that document. For example, 2018-08-31.

For example, the following demonstrates an entry for a POST request that will map to the countries2 table.

"countries2": {
"#path": "http://example.com/country/",
"#post": {
"start_date":"2018-08-31",
"end_date":"2018-09-01",
"departments":"[engineering,marketing,sales]",
"tags":"[blue,green,red]"
}
}

Configuring Paging

The driver supports two types of paging: offset and page numbering paging. To configure paging, specify values for the properties in the following tables that correspond to the type of paging you want to employ. Paging properties can be set for individual GET or POST requests by specifying these options in the body object. If paging properties are not specified, the driver will attempt to retrieve the first page for data sources that require paging.

The following demonstrates configuring row offset paging for an unparametrized GET request:

"table_name": {
"#path": "<host_name>/<endpoint_path>",
"#maximumPageSize":1000,
"#firstRowNumber":1,
"#pageSizeParameter":"maxResults",
"#rowOffsetParameter":"startAt"
}

table_name

is the name of the relational table to which the driver maps the endpoint. For example, countries2.

host_name

(optional) is the protocol and host name components of the URL endpoint. For example, http://example.com. You can omit this value by specifying the host name using the ServerName property.

endpoint_path

is the path component of the URL endpoint. For example, country.

Table 126. Row Offset Paging Properties
Property	Description
#maximumPageSize	The maximum page size in rows.
#firstRowNumber	The number of the first row. The default is 0; however, some systems begin numbering rows at 1.
#pageSizeParameter	The name of the URI parameter that contains the page size.
#rowOffsetParameter	The name of the URI parameter that contains the starting row number for this set of rows.

Table 127. Page Number Paging Properties
Property	Description
#maximumPageSize	The maximum page size in rows.
#firstPageNumber	The number of the first page. The default is 0; however, some systems begin numbering pages at 1.
#pageSizeParameter	The name of the URI parameter that contains the page size.
#pageNumberParameter	When requesting a page of rows, this is the name of the URI parameter to contain the page number.

In this section:

Sample Input REST File