Accessing job results

Result of all job types (Simulation, Validation and Comparison) that have been run on the platform can be accessed from the Corridor integrated notebook. In this notebook, we illustrate this process under following sections:

Access DataElement / Feature Job details
Access Model / Policy Job details
User action in case of Error logs with Jobs

Access DataElement / Feature Job Details¶

Note: In this section we illustrate the accessing job details for a DataElement. Users should follow exactly the same process for Feature job details. The Job details available for a DataElement and Feature are identical.

In [1]:

Copied!

# Import DataElement from Corridor
from corridor import DataElement

# Load a registered DataElement
DE_example = DataElement('annual_income')
# Import DataElement from Corridor
from corridor import DataElement

# Load a registered DataElement
DE_example = DataElement('annual_income')

List of all jobs created for DE_example

In [2]:

Copied!

DE_example.jobs[:3]
DE_example.jobs[:3]

Out[2]:

['simulation_Dec-03-2020 02:18 PM: Iteration #2',
 'simulation_Dec-03-2020 02:18 PM: Iteration #1',
 'simulation_Dec-03-2020 02:18 PM']

Below is a list of details available for each job of a DataElement

name
job_type
created_by
created_date
sample_size
sample_type
status
date_filter
date_filter_from_date
date_filter_to_date
comment
is_old
logs
runtime
report_dashboard

In [3]:

Copied!





# Access details for a specific job
job = DE_example.get_job("simulation_Nov-25-2020 03:46 PM: Iteration #2")

print(f'name: {job.name}')
print(f'job_type: {job.job_type}')
print(f'created_by: {job.created_by}')
print(f'created_date: {job.created_date}')
print(f'sample_size: {job.sample_size}')
print(f'sample_type: {job.sample_type}')
print(f'status: {job.status}')
print(f'date_filter: {job.date_filter if job.date_filter is None else job.date_filter.alias}')
print(f'date_filter_from_date: {job.date_filter_from_date}')
print(f'date_filter_to_date: {job.date_filter_to_date}')
print(f'comment: {job.comment}')
print(f'is_old: {job.is_old}')
print(f'logs: {job.logs}')
print(f'runtime: {job.runtime}')
# Access details for a specific job
job = DE_example.get_job("simulation_Nov-25-2020 03:46 PM: Iteration #2")

print(f'name: {job.name}')
print(f'job_type: {job.job_type}')
print(f'created_by: {job.created_by}')
print(f'created_date: {job.created_date}')
print(f'sample_size: {job.sample_size}')
print(f'sample_type: {job.sample_type}')
print(f'status: {job.status}')
print(f'date_filter: {job.date_filter if job.date_filter is None else job.date_filter.alias}')
print(f'date_filter_from_date: {job.date_filter_from_date}')
print(f'date_filter_to_date: {job.date_filter_to_date}')
print(f'comment: {job.comment}')
print(f'is_old: {job.is_old}')
print(f'logs: {job.logs}')
print(f'runtime: {job.runtime}')

name: simulation_Nov-25-2020 03:46 PM: Iteration #2
job_type: Simulation
created_by: master
created_date: 2020-11-25 10:18:34.654169
sample_size: 10
sample_type: random
status: COMPLETED
date_filter: None
date_filter_from_date: None
date_filter_to_date: None
comment: None
is_old: False
logs: None
runtime: 0:01:27.456770

List of available dashboards

In [4]:

Copied!





for k, v in job.report_dashboard.items():
    print(k)
    for k1, v1 in v.items():
        print(" -- "+ k1)
for k, v in job.report_dashboard.items():
    print(k)
    for k1, v1 in v.items():
        print(" -- "+ k1)

Descriptive Statistics
 -- histogram_distribution
 -- summary_stats
 -- quantile_stats

In [5]:

Copied!

# Accessing all available dahsboards as plotly object 
print(f'report_dashboard: {job.report_dashboard}')
# Accessing all available dahsboards as plotly object 
print(f'report_dashboard: {job.report_dashboard}')

report_dashboard: {'Descriptive Statistics': {'histogram_distribution': Figure({
    'data': [{'type': 'bar',
              'x': [35000.0, 75000.0, 88000.0, 119500.0, 60000.0, 70000.0,
                    43000.0, 80000.0, 14400.0, 57000.0],
              'y': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}],
    'layout': {'template': '...',
               'title': {'text': 'Histogram'},
               'xaxis': {'title': {'text': 'Values / Bins'}},
               'yaxis': {'title': {'text': 'Count'}}}
}), 'summary_stats': Figure({
    'data': [{'cells': {'values': [['# Records', '# Missing', '# Unique', '#
                                   Infinities', 'Mean', 'Std. Deviation',
                                   'Variance'], ['10', '0', '10', 0, '64,190.00',
                                   '29,582.37', '8.7512e+08']]},
              'header': {'values': ['Metric', 'Value']},
              'type': 'table'}],
    'layout': {'template': '...', 'title': {'text': 'Summary Statistics'}}
}), 'quantile_stats': Figure({
    'data': [{'cells': {'values': [['min', '1%', '25%', '50%', '75%', '99%',
                                   'max'], ['14,400.00', '14,400.00', '43,000.00',
                                   '60,000.00', '80,000.00', '119,500.00',
                                   '119,500.00']]},
              'header': {'values': ['Quantile', 'Value']},
              'type': 'table'}],
    'layout': {'template': '...', 'title': {'text': 'Quantile Distribution'}}
})}}

Accessing specific dashboards

In [6]:

Copied!

# Accessing histogram table from dashboard
job.report_dashboard['Descriptive Statistics']['histogram_distribution']
# Accessing histogram table from dashboard
job.report_dashboard['Descriptive Statistics']['histogram_distribution']

In [7]:

Copied!

# Accessing summary stats table from dashboard
job.report_dashboard['Descriptive Statistics']['summary_stats']
# Accessing summary stats table from dashboard
job.report_dashboard['Descriptive Statistics']['summary_stats']

Access Default Simulation

default_simulation is the simulation job that has been used during object approval. If the object is currently in draft mode, default_simulation is the latest simulation run for the object. This function is available for all simulatable objects: DataElement, Feature, Model, Policy

In [8]:

Copied!





# default simulation
DE_default_simulation = DE_example.default_simulation

print(f'DE status: {DE_example.current_status}')
print(f'default simulation status: {DE_default_simulation.status}')
print(f'name: {DE_default_simulation.name}')
print(f'created_by: {DE_default_simulation.created_by}')
print(f'created_date: {DE_default_simulation.created_date}')
print(f'comment: {DE_default_simulation.comment}')
print(f'logs: {DE_default_simulation.logs}')
print(f'runtime: {DE_default_simulation.runtime}')
print(f'sample_size: {DE_default_simulation.sample_size}')
# default simulation
DE_default_simulation = DE_example.default_simulation

print(f'DE status: {DE_example.current_status}')
print(f'default simulation status: {DE_default_simulation.status}')
print(f'name: {DE_default_simulation.name}')
print(f'created_by: {DE_default_simulation.created_by}')
print(f'created_date: {DE_default_simulation.created_date}')
print(f'comment: {DE_default_simulation.comment}')
print(f'logs: {DE_default_simulation.logs}')
print(f'runtime: {DE_default_simulation.runtime}')
print(f'sample_size: {DE_default_simulation.sample_size}')

DE status: Approved
default simulation status: COMPLETED
name: simulation_Dec-03-2020 02:18 PM: Iteration #2
created_by: master
created_date: 2020-12-03 09:11:44.768322
comment: None
logs: None
runtime: 0:01:46.375869
sample_size: 10

Access simulation results using read_data()

read_data() function of corridor library can be used to access job data for all job types. In corridor platform, whenever user runs a job, the output and input data gets stored as a parquet file. The exact file path of the same can be accessed by clicking on the job followed by Job Details --> Data --> Input/Output. Input/Output field provides exact python code required to read output data from file location.

Input data: input data consists of all inputs to the object

Output data: output data consists of the id column(s) (id__entity) and output column(s) (output)

This functionality is available for all simulatable objects and all job types. Input\Output of each job type and iteration gets saved.

In [9]:

Copied!





# Copy pasting code from "Job Deatils --> Data --> Input"
from corridor import read_data
input_current__entity = read_data("s3a://corridor.dev/lake/tmp_devqa/sim_DIST_10763_job_17643_ent_11687_input.parquet", "corridor_api.infrastructure.data_source_handler.parquet_folder.ParquetFolder") # noqa: E501

# Converting to pnadas and displaying top 10 rows
input_current__entity.limit(5).toPandas()
# Copy pasting code from "Job Deatils --> Data --> Input"
from corridor import read_data
input_current__entity = read_data("s3a://corridor.dev/lake/tmp_devqa/sim_DIST_10763_job_17643_ent_11687_input.parquet", "corridor_api.infrastructure.data_source_handler.parquet_folder.ParquetFolder") # noqa: E501

# Converting to pnadas and displaying top 10 rows
input_current__entity.limit(5).toPandas()

Out[9]:

	id__entity	annual_income
0	251539607552	65000.0
1	251539607553	48000.0
2	251539607554	109000.0
3	251539607555	110000.0
4	251539607556	62000.0

In [10]:

Copied!





# Copy pasting code from "Job Deatils --> Data --> Output"
from corridor import read_data
output_current__entity = read_data("s3a://corridor.dev/lake/tmp_devqa/sim_DIST_10763_job_17643_ent_11687_output.parquet", "corridor_api.infrastructure.data_source_handler.parquet_folder.ParquetFolder") # noqa: E501

# Converting to pnadas and displaying top 10 rows
output_current__entity.limit(5).toPandas()
# Copy pasting code from "Job Deatils --> Data --> Output"
from corridor import read_data
output_current__entity = read_data("s3a://corridor.dev/lake/tmp_devqa/sim_DIST_10763_job_17643_ent_11687_output.parquet", "corridor_api.infrastructure.data_source_handler.parquet_folder.ParquetFolder") # noqa: E501

# Converting to pnadas and displaying top 10 rows
output_current__entity.limit(5).toPandas()

Out[10]:

	id__entity	output
0	251539607552	5416.666667
1	251539607553	4000.000000
2	251539607554	9083.333333
3	251539607555	9166.666667
4	251539607556	5166.666667

Access Model / Policy Job details¶

Note: In this section we illustrate the accessing job details for a Model. Users should follow exactly the same process for Policy job details. The Job details available for a Model and Policy are identical.

In [12]:

Copied!

# Import Model from Corridor
from corridor import Model

# Load a registered Model
Model_example = Model('PD Model Strict', version=1)
# Import Model from Corridor
from corridor import Model

# Load a registered Model
Model_example = Model('PD Model Strict', version=1)

List of all jobs created for Model_example

In [13]:

Copied!

Model_example.jobs[-5:]
Model_example.jobs[-5:]

Out[13]:

['simulation_Dec-03-2020 02:15 PM',
 'comparison_Nov-26-2020 12:09 AM',
 'validation_Nov-25-2020 11:54 PM',
 'simulation_Nov-25-2020 11:00 PM: Iteration #2',
 'simulation_Nov-25-2020 11:00 PM: Iteration #1']

Below is a list of additional details available for each job of a Model / Policy - these are in addition to job details available for a DataElement or a Feature

benchmark_simulation
challenger
current

Access current and benchmark for a model validation job

In [12]:

Copied!

# Access details for a specific VALIDATION job
job = Model_example.get_job("validation_Nov-25-2020 11:54 PM")

print(f'current: {job.current}')
print(f'benchmark_simulation: {job.benchmark_simulation}')
# Access details for a specific VALIDATION job
job = Model_example.get_job("validation_Nov-25-2020 11:54 PM")

print(f'current: {job.current}')
print(f'benchmark_simulation: {job.benchmark_simulation}')

current: <Model name="PD Model Strict", version=1>
benchmark_simulation: <Job job_type="Simulation" name="simulation_Nov-25-2020 11:00 PM: Iteration #1">

Access current and challenger for a model comparison job

In [13]:

Copied!

# Access details for a specific COMPARISON job
job = Model_example.get_job("comparison_Nov-26-2020 12:09 AM")

print(f'current: {job.current}')
print(f'challenger: {job.challenger}')
# Access details for a specific COMPARISON job
job = Model_example.get_job("comparison_Nov-26-2020 12:09 AM")

print(f'current: {job.current}')
print(f'challenger: {job.challenger}')

current: <Model name="PD Model Strict", version=1>
challenger: <Model name="PD Model Lenient", version=1>

Access simulation results using read_data() for Policy

In case of policy, for every single iteration, two output datasets are created

input_current__entity : Contains input information at offer level
output_current__entity : Contains of final and intermediate outputs information at offer level.

Both these datasets can be accessed using read_data.
Contents of input_current__entity and output_current__entity will be discussed in details in "3.b Running Simulation"

In [14]:

Copied!





# Entity Data for a sample Policy simulation
from corridor import read_data
input_current__entity = read_data("s3a://corridor.dev/lake/tmp_devqa/sim_SIM_3482_job_15329_ent_261_input.parquet", "corridor_api.infrastructure.data_source_handler.parquet_folder.ParquetFolder") # noqa: E501

# Converting to pnadas and displaying top 10 rows
input_current__entity.limit(5).toPandas()
# Entity Data for a sample Policy simulation
from corridor import read_data
input_current__entity = read_data("s3a://corridor.dev/lake/tmp_devqa/sim_SIM_3482_job_15329_ent_261_input.parquet", "corridor_api.infrastructure.data_source_handler.parquet_folder.ParquetFolder") # noqa: E501

# Converting to pnadas and displaying top 10 rows
input_current__entity.limit(5).toPandas()

Out[14]:

	id__entity	requested_loan_amount	debt_capacity	fico_range_high	pd_model_ver1	id__offer
0	251539607593	21000.0	0.323077	704.0	0.15	25153960759321000.07.9936.0
1	251539607593	21000.0	0.323077	704.0	0.15	25153960759321000.010.9936.0
2	251539607593	21000.0	0.323077	704.0	0.15	25153960759322000.07.9936.0
3	251539607593	21000.0	0.323077	704.0	0.15	25153960759322000.010.9936.0
4	251539607624	23200.0	0.580000	709.0	0.20	25153960762421200.07.9936.0

In [15]:

Copied!





# Offer Data for the same sample Policy simulation as above
from corridor import read_data
output_current__entity = read_data("s3a://corridor.dev/lake/tmp_devqa/sim_SIM_3482_job_15329_ent_261_output.parquet", "corridor_api.infrastructure.data_source_handler.parquet_folder.ParquetFolder") # noqa: E501

# Converting to pnadas and displaying top 10 rows
output_current__entity.limit(5).toPandas()
# Offer Data for the same sample Policy simulation as above
from corridor import read_data
output_current__entity = read_data("s3a://corridor.dev/lake/tmp_devqa/sim_SIM_3482_job_15329_ent_261_output.parquet", "corridor_api.infrastructure.data_source_handler.parquet_folder.ParquetFolder") # noqa: E501

# Converting to pnadas and displaying top 10 rows
output_current__entity.limit(5).toPandas()

Out[15]:

	id__entity	output__entity	output__strategy__2	output__strategy__1	output__segment__2_2	output__segment__2_1	output__rule__1_1_1	output__rule__2_1_2	output__rule__2_2_3	output__rule__2_1_3	output__rule__2_2_1	output__rule__2_1_1	output__rule__2_2_2	output__config__potential_loan_amount	output__config__potential_loan_term	output__config__potential_int_rate	profiling_info	id__offer	output__offer
0	251539607593	False	False	True	False	True	True	False	None	True	None	False	None	21000.0	36.0	7.99	M_FICO	25153960759321000.07.9936.0	False
1	251539607593	False	False	True	False	True	True	True	None	True	None	False	None	21000.0	36.0	10.99	M_FICO	25153960759321000.010.9936.0	False
2	251539607593	False	False	True	False	True	True	False	None	True	None	False	None	22000.0	36.0	7.99	M_FICO	25153960759322000.07.9936.0	False
3	251539607593	False	False	True	False	True	True	True	None	True	None	False	None	22000.0	36.0	10.99	M_FICO	25153960759322000.010.9936.0	False
4	251539607624	False	False	True	False	True	True	False	None	False	None	False	None	21200.0	36.0	7.99	M_FICO	25153960762421200.07.9936.0	False

User action in case of Error logs with Jobs¶

In this section, we illustrate possible user actions if a simulation/ comparison/ validation job shows error log. We take an example of policy simulation. Similar action applies in other cases (model validation, feature simulation etc.)

In [16]:

Copied!

# Import Policy from Corridor
from corridor import Policy

# Load a registered Policy
Policy_example = Policy('UW Policy with PD Model and Framework - Strict')
# Import Policy from Corridor
from corridor import Policy

# Load a registered Policy
Policy_example = Policy('UW Policy with PD Model and Framework - Strict')

In [17]:

Copied!





# Policy strategy structure
for strategy in Policy_example.strategies:
    print(f'strategy: {strategy.name}')
    if strategy.segments:
        for seg in strategy.segments:
            print(f' segment: {seg.name}')
            for rule in seg.rules:
                print(f'  rule: {rule.name}')
    else:
        for rule in strategy.rules:
            print(f' rule: {rule.name}')
# Policy strategy structure
for strategy in Policy_example.strategies:
    print(f'strategy: {strategy.name}')
    if strategy.segments:
        for seg in strategy.segments:
            print(f' segment: {seg.name}')
            for rule in seg.rules:
                print(f'  rule: {rule.name}')
    else:
        for rule in strategy.rules:
            print(f' rule: {rule.name}')

strategy: Min. Eligibility Requirement
 rule: Min FICO & Max Debt Capacity
strategy: Loan Approval Strategy
 segment: 680 < FICO < 780
  rule: Loan Amount
  rule: Loan Pricing
  rule: PD Threshold
 segment: FICO >= 780
  rule: Loan Amount
  rule: Loan Pricing
  rule: PD Threshold

In [18]:

Copied!

# Available jobs
Policy_example.jobs
# Available jobs
Policy_example.jobs

Out[18]:

['comparison_Dec-01-2020 12:20 PM',
 'simulation_Dec-01-2020 12:18 PM',
 'simulation_Nov-30-2020 06:42 PM',
 'comparison_Nov-26-2020 11:47 AM',
 'simulation_Nov-26-2020 11:43 AM']

In [19]:

Copied!

# Access details for a specific job
job = Policy_example.get_job("simulation_Nov-30-2020 06:42 PM")
# Access details for a specific job
job = Policy_example.get_job("simulation_Nov-30-2020 06:42 PM")

In [20]:

Copied!

# Checking if the job ran successfully without any log
print(f'status: {job.status}')
print(f'logs: {job.logs}')
# Checking if the job ran successfully without any log
print(f'status: {job.status}')
print(f'logs: {job.logs}')

status: COMPLETED
logs: ********** ERRORS **********
[ERROR] application_id__reserved_xgpajewcuspxexsl=251539892330: AssertionError: In strategy ' Loan Approval Strategy', either no segment passed or no relevant rules were found in the segments that were applicable to the record
[ERROR] application_id__reserved_xgpajewcuspxexsl=20000072968: AssertionError: In strategy ' Loan Approval Strategy', either no segment passed or no relevant rules were found in the segments that were applicable to the record
[ERROR] application_id__reserved_xgpajewcuspxexsl=20000123361: AssertionError: In strategy ' Loan Approval Strategy', either no segment passed or no relevant rules were found in the segments that were applicable to the record
[ERROR] application_id__reserved_xgpajewcuspxexsl=251539703079: AssertionError: In strategy ' Loan Approval Strategy', either no segment passed or no relevant rules were found in the segments that were applicable to the record
[ERROR] application_id__reserved_xgpajewcuspxexsl=268719495752: AssertionError: In strategy ' Loan Approval Strategy', either no segment passed or no relevant rules were found in the segments that were applicable to the record
[ERROR] application_id__reserved_xgpajewcuspxexsl=268719486971: AssertionError: In strategy ' Loan Approval Strategy', either no segment passed or no relevant rules were found in the segments that were applicable to the record
[ERROR] application_id__reserved_xgpajewcuspxexsl=251539856456: AssertionError: In strategy ' Loan Approval Strategy', either no segment passed or no relevant rules were found in the segments that were applicable to the record
[ERROR] application_id__reserved_xgpajewcuspxexsl=242949741887: AssertionError: In strategy ' Loan Approval Strategy', either no segment passed or no relevant rules were found in the segments that were applicable to the record
[ERROR] application_id__reserved_xgpajewcuspxexsl=217180135769: AssertionError: In strategy ' Loan Approval Strategy', either no segment passed or no relevant rules were found in the segments that were applicable to the record

The error message above indicates that for certain application ids (alias : application_id__reserved_xgpajewcuspxexsl), in 'Loan Approval Strategy' strategy, either no segment passed or no relevant rules were found in the segments that were applicable to the record..
There are 3 possible next steps at this stage:

Adjust Policy strategy / segment definition/ rule : In this example, this means adjusting 'Loan Approval Strategy' to include additional segments to have more coverage for application data.
Do Nothing : If the error message shown in log is a known issue, user may choose to do nothing.
Add filters while running the job : If the error message shown in log is a known issue and user wants to eliminate log generation, appropriate filters can be added while adding the simulation / comparison / validation.