Skip to content

Metadata Function

get(connection, parameters_dict)

A function to return back the metadata by querying databricks SQL Warehouse using a connection specified by the user.

The available connectors by RTDIP are Databricks SQL Connect, PYODBC SQL Connect, TURBODBC SQL Connect.

The available authentcation methods are Certificate Authentication, Client Secret Authentication or Default Authentication. See documentation.

This function requires the user to input a dictionary of parameters. (See Attributes table below)

Parameters:

Name Type Description Default
connection object

Connection chosen by the user (Databricks SQL Connect, PYODBC SQL Connect, TURBODBC SQL Connect)

required
parameters_dict dict

A dictionary of parameters (see Attributes table below)

required

Attributes:

Name Type Description
business_unit str

Business unit

region str

Region

asset str

Asset

data_security_level str

Level of data security

tag_names optional, list

Either pass a list of tagname/tagnames ["tag_1", "tag_2"] or leave the list blank [] or leave the parameter out completely

Returns:

Name Type Description
DataFrame pd.DataFrame

A dataframe of metadata.

Source code in src/sdk/python/rtdip_sdk/queries/metadata.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
def get(connection: object, parameters_dict: dict) -> pd.DataFrame:
    '''
    A function to return back the metadata by querying databricks SQL Warehouse using a connection specified by the user. 

    The available connectors by RTDIP are Databricks SQL Connect, PYODBC SQL Connect, TURBODBC SQL Connect.

    The available authentcation methods are Certificate Authentication, Client Secret Authentication or Default Authentication. See documentation.

    This function requires the user to input a dictionary of parameters. (See Attributes table below)

    Args:
        connection: Connection chosen by the user (Databricks SQL Connect, PYODBC SQL Connect, TURBODBC SQL Connect)
        parameters_dict: A dictionary of parameters (see Attributes table below)

    Attributes:
        business_unit (str): Business unit
        region (str): Region
        asset (str): Asset 
        data_security_level (str): Level of data security
        tag_names (optional, list): Either pass a list of tagname/tagnames ["tag_1", "tag_2"] or leave the list blank [] or leave the parameter out completely

    Returns:
        DataFrame: A dataframe of metadata.
    '''
    try:
        query = _query_builder(parameters_dict, metadata=True)

        try:
            cursor = connection.cursor()
            cursor.execute(query)
            df = cursor.fetch_all()
            cursor.close()
            return df
        except Exception as e:
            logging.exception('error returning dataframe')
            raise e

    except Exception as e:
        logging.exception('error returning metadata function')
        raise e

Example

from rtdip_sdk.authentication.azure import DefaultAuth
from rtdip_sdk.connectors import DatabricksSQLConnection
from rtdip_sdk.queries import metadata

auth = DefaultAuth().authenticate()
token = auth.get_token("2ff814a6-3304-4ab8-85cb-cd0e6f879c1d/.default").token
connection = DatabricksSQLConnection("{server_hostname}", "{http_path}", token)

parameters = {
    "business_unit": "Business Unit",
    "region": "Region", 
    "asset": "Asset Name", 
    "data_security_level": "Security Level",
    "tag_names": ["tag_1", "tag_2"], #list of tags
}
x = metadata.get(connection, parameters)
print(x)

This example is using DefaultAuth() and DatabricksSQLConnection() to authenticate and connect. You can find other ways to authenticate here. The alternative built in connection methods are either by PYODBCSQLConnection(), TURBODBCSQLConnection() or SparkConnection().

Note

server_hostname and http_path can be found on the SQL Warehouses Page.