Accessing registered objects
Accessing Registered Objects¶
Below is a list of registered objects that are available through the Platform:
- DataElement
- Feature
- Model
- ModelTransform
- Policy
- GlobalFunction
- Report
- User
In this notebook, we will illustrate accessing registered objects in different ways.
- Directly by calling objects using alias
- Searching objects based on filters
- Access registered globalfunctions and use it like python function
Directly by calling objects using alias¶
# Import Corridor Package Objects
from corridor import DataElement, Feature, Model, ModelTransform, User
# Call DataELement `requested_loan_amount` using alias
DE_example = DataElement('requested_loan_amount')
DE_example.name
'Requested Loan Amount'
# Call Feature `debt_capacity` using alias
Feature_example = Feature('debt_capacity')
Feature_example.name
'Debt Capacity'
# Call Model `PD Model Strict` using model name and version
Model_example = Model("PD Model Strict", version=1)
Model_example.name
'PD Model Strict'
# Call ModelTransform using model name & transform alias
FICO_transform = ModelTransform(model='PD Model with Transform',alias = 'fico_normalized')
FICO_transform.name
'FICO Normalized'
# Initialize model using registed model name on platform
Model_example = Model('PD Model with Transform')
# Call ModelTransform using model object & transform alias
FICO_transform = ModelTransform(model=Model_example,alias='fico_normalized')
FICO_transform.name
'FICO Normalized'
# Call registered user with username `admin`
User_example = User(username='admin')
User_example.username
'admin'
Searching objects based on filters¶
If user has a vague idea of exact alias or wants to access certain category of objects, it can be done by searching using one or more of the following filters:
- contains
- status
- permissible_purpose
- type
- platform_entity
- keywords
- is_aggregated
This is done by calling object.all(filters) functionality of corridor package. Objects could be DataElement, Feature etc. The output of object.all() is list of registered objects with specified filters and filter values.
Searching using filters is currently available for: DataElement, DataTable, Feature, Model, Policy, GlobalVariable, GlobalFunction, RuntimePrameter
# Import Corridor Package Objects
from corridor import DataTable, Feature, Model, Policy, GlobalFunction
# Listing all registered features using partial alias string and feature type
Feature.all(contains='binned', type=['String', 'ArrayString'])
[<Feature alias="fico_bins", version=1>, <Feature alias="annual_income_bins", version=1>]
# Search models using 3 filters: permissible_purpose, status, contains
Model.all(permissible_purpose = "Underwriting", status = "Draft", contains="PD")[:5]
[<Model name="PD Model Strict", version=1>, <Model name="PD Model Lenient", version=1>, <Model name="PD Model Lenient_copy1", version=1>, <Model name="PD Model demo", version=1>, <Model name="PD Model Lenient v2", version=1>]
# Listing all registered underwriting policies
Policy.all(type = "UnderWriting")['UnderWriting'][:2]
[<Policy name="opt strat_4171", version=1>, <Policy name="policy_test1_4168", version=1>]
Access registered globalfunctions and use it like python function¶
# Import GlobalFunction from corridor package
from corridor import GlobalFunction
Instantiate a GlobalFunction
# Search registered global function for calculating exponential
GlobalFunction.all(contains="exponent")
[<GlobalFunction alias="exponential_function_with_rounding", version=1>]
GF_example = GlobalFunction("exponential_function_with_rounding")
Using GF_example on integers
# Using GF_example on integers
print(GF_example(9, 0.5))
3.0
Using GF_example on spark df
# Create sample data from registered DataElement using alias annual_income (application level DE)
from corridor import create_data
example_data = create_data('annual_income')
example_data.show(5)
print(f'(Rows, Columns) : ({example_data.count()}, {len(example_data.columns)})')
+-------------+ |annual_income| +-------------+ | 42000.0| | 27165.0| | 42000.0| | 30000.0| | 38000.0| +-------------+ only showing top 5 rows (Rows, Columns) : (2489721, 1)
# Adding a random scaling factor to scale down annual_incomes of applicants
from pyspark.sql.functions import rand, when
example_data = example_data.withColumn('scaling_factor',when(rand()<0.95,0.95).otherwise(0.98))
example_data.show(10)
+-------------+--------------+ |annual_income|scaling_factor| +-------------+--------------+ | 42000.0| 0.95| | 27165.0| 0.95| | 42000.0| 0.95| | 30000.0| 0.95| | 38000.0| 0.95| | 75000.0| 0.95| | 55000.0| 0.95| | 45000.0| 0.95| | 55000.0| 0.95| | 42000.0| 0.98| +-------------+--------------+ only showing top 10 rows
# Converting GlobalFunction into Spark User Defined Function (UDF) and using it to scale 'annual_income' by
#'scaling_factor'
import pyspark.sql.functions as F
from pyspark.sql.types import *
GF_example_udf = F.udf(GF_example.get_python_function(), FloatType())
example_data = example_data.withColumn("scaled_income",GF_example_udf("annual_income", "scaling_factor"))
example_data.show(5)
+-------------+--------------+-------------+ |annual_income|scaling_factor|scaled_income| +-------------+--------------+-------------+ | 42000.0| 0.95| 24665.0| | 27165.0| 0.95| 16305.0| | 42000.0| 0.95| 24665.0| | 30000.0| 0.95| 17917.0| | 38000.0| 0.95| 22428.0| +-------------+--------------+-------------+ only showing top 5 rows