Corridor package
Introduction To Corridor Package and Corridor Package Entities¶
Corridor is a python library that gives users access to all the Registered Objects on the Corridor Platforms. Below is a list of Registered Objects that are accessible through Corridor
- DataTable
- DataElement
- Feature
- Model
- ModelTransform
- Policy
- GlobalFunction
Corridor package contains class objects and functions that facilitate access to information on platform. DataTable, DataElement, Feature, Model, ModelTransform, Policy, GlobalFunctions are classes with predefined attributes (eg: name, alias, permissible_purpose, platform_entity, description, version etc) and methods (eg: get_simulation(), get_approval_workflow() ). Each class represent a specific registered object, for instance: class DataElement is used to instantiate a DataElement type object. Once instantiated, users can access data and metadata of instantiated DataElement in notebooks. Similarly it can be done for other objects. Attributes and methods depend on what is meaningful for a class to have. For instance, get_python_function() is only applicable for GlobalFunctions.
In addition to class objects, there are couple of functions that can be used to access data in notebook:
create_data This function creates a data from a given list of aliases or objects of DataElement/Features/Models etc.
read_data: Read data from the provided location. It returns a pyspark dataframe with data at input location.
For each of the Registered Objects, we can access a set of metadata for the objects, also we can recreate the Registered Objects on new datasets. This notebook illustration is divided into 2 sections:
- How to access basic metadata for registered objects: Model Example - Using Corridor Package Classes
- How to create data using registered objects - Using create_data()
lllustration: How to load a model registered on platform and access it's basic details¶
# Import Model from Corridor
from corridor import Model
# use the MODEL NAME to access the model
Model_example = Model('PD Model Strict')
print(f'name: {Model_example.name}')
print(f'output_alias: {Model_example.output_alias}')
print(f'inputs: {[x.alias for x in Model_example.inputs]}')
print(f'type: {Model_example.type}')
print(f'description: {Model_example.description}')
print(f'platform_entity: {Model_example.platform_entity}')
print(f'permissible_purpose: {Model_example.permissible_purpose}')
print(f'group: {Model_example.group}')
print(f'current_status: {Model_example.current_status}')
name: PD Model Strict output_alias: pd_model_ver1 inputs: ['debt_capacity', 'fico_range_high'] type: Binary Classification description: Ver1 of the PD Model based on FICO, Age of Credit Profile and Debt Capacity platform_entity: Application permissible_purpose: ['Underwriting'] group: Probability of Default current_status: Draft
Illustration: Creating data using registered objects¶
# Import Corridor Package Objects
from corridor import create_data
# Create dataset using aliases of registered DataELement
df = create_data('requested_loan_amount','annual_income')
df.limit(5).toPandas()
| requested_loan_amount | annual_income | |
|---|---|---|
| 0 | 13000.0 | 42000.0 |
| 1 | 8800.0 | 27165.0 |
| 2 | 10000.0 | 42000.0 |
| 3 | 20000.0 | 30000.0 |
| 4 | 10875.0 | 38000.0 |