Skip to content

Query Databricks SQL using the RTDIP SDK

This article provides a guide on how to use RTDIP SDK to query data via Databricks SQL. Before getting started, ensure you have installed the RTDIP Python Package and check the RTDIP Installation Page for all the required prerequisites.

How to use RTDIP SDK with Databricks SQL

The RTDIP SDK has rich support of querying data using Databricks SQL, such as allowing the user to authenticate, connect and/or use the most commonly requested methods for manipulating time series data accessible via Databricks SQL.

Authentication

Refer to the Azure Active Directory documentation for further options to perform Azure AD authentication, such as Service Principal authentication using certificates or secrets. Below is an example of performing default authentication that retrieves a token for Azure Databricks.

Also refer to the Code Reference for further technical information.

from rtdip_sdk.authentication import authenticate as auth

authentication = auth.DefaultAuth().authenticate()
access_token = authentication.get_token("2ff814a6-3304-4ab8-85cb-cd0e6f879c1d/.default").token

Note

If you are experiencing any trouble authenticating please see Troubleshooting - Authentication

Refer to the Databricks documentation for further information about generating a Databricks PAT Token. Below is an example of performing default authentication that retrieves a token for a Databricks Workspace.

Provide your dbapi..... token to the access_token in the examples below.

access_token = "dbapi.........."

Connect to Databricks SQL

The RTDIP SDK offers several ways to connect to a Databricks SQL Warehouse.

The simplest method to connect to RTDIP and does not require any additional installation steps.

from rtdip_sdk.connectors import DatabricksSQLConnection

server_hostname = "server_hostname"
http_path = "http_path"
access_token = "token"

connection = DatabricksSQLConnection(server_hostname, http_path, access_token)

Replace server_hostname, http_path and access_token with your own information.

For more information about each of the connection methods, please see Code Reference and navigate to the required section.

A popular library that python developers use for ODBC connectivity but requires more setup steps.

ODBC or JDBC are required to leverage PYODBC. Follow these instructions to install the drivers in your environment.

from rtdip_sdk.connectors import PYODBCSQLConnection

server_hostname = "server_hostname"
http_path = "http_path"
access_token = "token"
driver_path = "/Library/simba/spark/lib/libsparkodbc_sbu.dylib"

connection = PYODBCSQLConnection(driver_path, sever_hostname, http_path, access_token)

Replace server_hostname, http_path and access_token with your own information.

For more information about each of the connection methods, please see Code Reference and navigate to the required section.

The RTDIP development team have found this to be the most performant method of connecting to RTDIP leveraging the arrow implementation within Turbodbc to obtain data, but requires a number of additional installation steps to get working on OSX, Linux and Windows

from rtdip_sdk.connectors import TURBODBCSQLConnection

server_hostname = "server_hostname"
http_path = "http_path"
access_token = "token"

connection = TURBODBCSQLConnection(server_hostname, http_path, access_token)

Replace server_hostname, http_path and access_token with your own information.

For more information about each of the connection methods, please see Code Reference and navigate to the required section.

Functions

Finally, after authenticating and connecting using one of the methods above, you have access to the commonly requested RTDIP functions such as Resample, Interpolate, Raw, Time Weighted Averages or Metadata.

1. To use any of the RTDIP functions, use the commands below.

from rtdip_sdk.queries import resample
from rtdip_sdk.queries import interpolate
from rtdip_sdk.queries import raw
from rtdip_sdk.queries import time_weighted_average
from rtdip_sdk.queries import metadata

2. From functions you can use any of the following methods.

Resample

resample.get(connection, parameters_dict)

Interpolate

interpolate.get(connection, parameters_dict)

Raw

raw.get(connection, parameters_dict)

Time Weighted Average

time_weighted_average.get(connection, parameter_dict)

Metadata

metadata.get(connection, parameter_dict)

For more information about the function parameters see Code Reference and navigate through the required function.

Example

This is a code example of the RTDIP SDK Interpolate function. You will need to replace the parameters with your own requirements and details. If you are unsure on the options please see Code Reference - Interpolate and navigate to the attributes section.

from rtdip_sdk.authentication import authenticate as auth
from rtdip_sdk.connectors import DatabricksSQLConnection
from rtdip_sdk.queries import interpolate

authentication = auth.DefaultAuth().authenticate()
access_token = authentication.get_token("2ff814a6-3304-4ab8-85cb-cd0e6f879c1d/.default").token
connection = DatabricksSQLConnection("{server_hostname}", "{http_path}", access_token)

parameters = {
    "business_unit": "{business_unit}", 
    "region": "{region}",
    "asset": "{asset}", 
    "data_security_level": "{date_security_level}",
    "data_type": "{data_type}", #options are float, integer, string and double (the majority of data is float)
    "tag_names": ["{tag_name_1}, {tag_name_2}"],
    "start_date": "2022-03-08", #start_date can be a date in the format "YYYY-MM-DD" or a datetime in the format "YYYY-MM-DDTHH:MM:SS"
    "end_date": "2022-03-10", #end_date can be a date in the format "YYYY-MM-DD" or a datetime in the format "YYYY-MM-DDTHH:MM:SS"
    "time_interval_rate": "1", #numeric input
    "time_interval_unit": "hour", #options are second, minute, day or hour
    "agg_method": "first", #options are first, last, avg, min, max
    "interpolation_method": "forward_fill", #options are forward_fill, backward_fill or linear
    "include_bad_data": True #boolean options are True or False
}

result = interpolate.get(connection, parameters)
print(result)

Note

If you are having problems please see Troubleshooting for more information.

Conclusion

Congratulations! You have now learnt how to use the RTDIP SDK. Please check back for regular updates and if you would like to contribute, you can open an issue on GitHub. See the Contributing Guide for more help.