Accessing job results
Result of all job types (Simulation, Validation and Comparison) that have been run on the platform can be accessed from the Corridor integrated notebook. In this notebook, we illustrate this process under following sections:
- Access DataElement / Feature Job details
- Access Model / Policy Job details
- User action in case of Error logs with Jobs
Access DataElement / Feature Job Details¶
Note: In this section we illustrate the accessing job details for a DataElement. Users should follow exactly the same process for Feature job details. The Job details available for a DataElement and Feature are identical.
# Import DataElement from Corridor
from corridor import DataElement
# Load a registered DataElement
DE_example = DataElement('annual_income')
List of all jobs created for DE_example
DE_example.jobs[:3]
['simulation_Dec-03-2020 02:18 PM: Iteration #2', 'simulation_Dec-03-2020 02:18 PM: Iteration #1', 'simulation_Dec-03-2020 02:18 PM']
Below is a list of details available for each job of a DataElement
- name
- job_type
- created_by
- created_date
- sample_size
- sample_type
- status
- date_filter
- date_filter_from_date
- date_filter_to_date
- comment
- is_old
- logs
- runtime
- report_dashboard
# Access details for a specific job
job = DE_example.get_job("simulation_Nov-25-2020 03:46 PM: Iteration #2")
print(f'name: {job.name}')
print(f'job_type: {job.job_type}')
print(f'created_by: {job.created_by}')
print(f'created_date: {job.created_date}')
print(f'sample_size: {job.sample_size}')
print(f'sample_type: {job.sample_type}')
print(f'status: {job.status}')
print(f'date_filter: {job.date_filter if job.date_filter is None else job.date_filter.alias}')
print(f'date_filter_from_date: {job.date_filter_from_date}')
print(f'date_filter_to_date: {job.date_filter_to_date}')
print(f'comment: {job.comment}')
print(f'is_old: {job.is_old}')
print(f'logs: {job.logs}')
print(f'runtime: {job.runtime}')
name: simulation_Nov-25-2020 03:46 PM: Iteration #2 job_type: Simulation created_by: master created_date: 2020-11-25 10:18:34.654169 sample_size: 10 sample_type: random status: COMPLETED date_filter: None date_filter_from_date: None date_filter_to_date: None comment: None is_old: False logs: None runtime: 0:01:27.456770
List of available dashboards
for k, v in job.report_dashboard.items():
print(k)
for k1, v1 in v.items():
print(" -- "+ k1)
Descriptive Statistics -- histogram_distribution -- summary_stats -- quantile_stats
# Accessing all available dahsboards as plotly object
print(f'report_dashboard: {job.report_dashboard}')
report_dashboard: {'Descriptive Statistics': {'histogram_distribution': Figure({
'data': [{'type': 'bar',
'x': [35000.0, 75000.0, 88000.0, 119500.0, 60000.0, 70000.0,
43000.0, 80000.0, 14400.0, 57000.0],
'y': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}],
'layout': {'template': '...',
'title': {'text': 'Histogram'},
'xaxis': {'title': {'text': 'Values / Bins'}},
'yaxis': {'title': {'text': 'Count'}}}
}), 'summary_stats': Figure({
'data': [{'cells': {'values': [['# Records', '# Missing', '# Unique', '#
Infinities', 'Mean', 'Std. Deviation',
'Variance'], ['10', '0', '10', 0, '64,190.00',
'29,582.37', '8.7512e+08']]},
'header': {'values': ['Metric', 'Value']},
'type': 'table'}],
'layout': {'template': '...', 'title': {'text': 'Summary Statistics'}}
}), 'quantile_stats': Figure({
'data': [{'cells': {'values': [['min', '1%', '25%', '50%', '75%', '99%',
'max'], ['14,400.00', '14,400.00', '43,000.00',
'60,000.00', '80,000.00', '119,500.00',
'119,500.00']]},
'header': {'values': ['Quantile', 'Value']},
'type': 'table'}],
'layout': {'template': '...', 'title': {'text': 'Quantile Distribution'}}
})}}
Accessing specific dashboards
# Accessing histogram table from dashboard
job.report_dashboard['Descriptive Statistics']['histogram_distribution']
# Accessing summary stats table from dashboard
job.report_dashboard['Descriptive Statistics']['summary_stats']
Access Default Simulation
default_simulation is the simulation job that has been used during object approval. If the object is currently in draft mode, default_simulation is the latest simulation run for the object. This function is available for all simulatable objects: DataElement, Feature, Model, Policy
# default simulation
DE_default_simulation = DE_example.default_simulation
print(f'DE status: {DE_example.current_status}')
print(f'default simulation status: {DE_default_simulation.status}')
print(f'name: {DE_default_simulation.name}')
print(f'created_by: {DE_default_simulation.created_by}')
print(f'created_date: {DE_default_simulation.created_date}')
print(f'comment: {DE_default_simulation.comment}')
print(f'logs: {DE_default_simulation.logs}')
print(f'runtime: {DE_default_simulation.runtime}')
print(f'sample_size: {DE_default_simulation.sample_size}')
DE status: Approved default simulation status: COMPLETED name: simulation_Dec-03-2020 02:18 PM: Iteration #2 created_by: master created_date: 2020-12-03 09:11:44.768322 comment: None logs: None runtime: 0:01:46.375869 sample_size: 10
Access simulation results using read_data()
read_data() function of corridor library can be used to access job data for all job types. In corridor platform, whenever user runs a job, the output and input data gets stored as a parquet file. The exact file path of the same can be accessed by clicking on the job followed by Job Details --> Data --> Input/Output. Input/Output field provides exact python code required to read output data from file location.
Input data: input data consists of all inputs to the object
Output data: output data consists of the id column(s) (id__entity) and output column(s) (output)
This functionality is available for all simulatable objects and all job types. Input\Output of each job type and iteration gets saved.
# Copy pasting code from "Job Deatils --> Data --> Input"
from corridor import read_data
input_current__entity = read_data("s3a://corridor.dev/lake/tmp_devqa/sim_DIST_10763_job_17643_ent_11687_input.parquet", "corridor_api.infrastructure.data_source_handler.parquet_folder.ParquetFolder") # noqa: E501
# Converting to pnadas and displaying top 10 rows
input_current__entity.limit(5).toPandas()
| id__entity | annual_income | |
|---|---|---|
| 0 | 251539607552 | 65000.0 |
| 1 | 251539607553 | 48000.0 |
| 2 | 251539607554 | 109000.0 |
| 3 | 251539607555 | 110000.0 |
| 4 | 251539607556 | 62000.0 |
# Copy pasting code from "Job Deatils --> Data --> Output"
from corridor import read_data
output_current__entity = read_data("s3a://corridor.dev/lake/tmp_devqa/sim_DIST_10763_job_17643_ent_11687_output.parquet", "corridor_api.infrastructure.data_source_handler.parquet_folder.ParquetFolder") # noqa: E501
# Converting to pnadas and displaying top 10 rows
output_current__entity.limit(5).toPandas()
| id__entity | output | |
|---|---|---|
| 0 | 251539607552 | 5416.666667 |
| 1 | 251539607553 | 4000.000000 |
| 2 | 251539607554 | 9083.333333 |
| 3 | 251539607555 | 9166.666667 |
| 4 | 251539607556 | 5166.666667 |
Access Model / Policy Job details¶
Note: In this section we illustrate the accessing job details for a Model. Users should follow exactly the same process for Policy job details. The Job details available for a Model and Policy are identical.
# Import Model from Corridor
from corridor import Model
# Load a registered Model
Model_example = Model('PD Model Strict', version=1)
List of all jobs created for Model_example
Model_example.jobs[-5:]
['simulation_Dec-03-2020 02:15 PM', 'comparison_Nov-26-2020 12:09 AM', 'validation_Nov-25-2020 11:54 PM', 'simulation_Nov-25-2020 11:00 PM: Iteration #2', 'simulation_Nov-25-2020 11:00 PM: Iteration #1']
Below is a list of additional details available for each job of a Model / Policy - these are in addition to job details available for a DataElement or a Feature
- benchmark_simulation
- challenger
- current
Access current and benchmark for a model validation job
# Access details for a specific VALIDATION job
job = Model_example.get_job("validation_Nov-25-2020 11:54 PM")
print(f'current: {job.current}')
print(f'benchmark_simulation: {job.benchmark_simulation}')
current: <Model name="PD Model Strict", version=1> benchmark_simulation: <Job job_type="Simulation" name="simulation_Nov-25-2020 11:00 PM: Iteration #1">
Access current and challenger for a model comparison job
# Access details for a specific COMPARISON job
job = Model_example.get_job("comparison_Nov-26-2020 12:09 AM")
print(f'current: {job.current}')
print(f'challenger: {job.challenger}')
current: <Model name="PD Model Strict", version=1> challenger: <Model name="PD Model Lenient", version=1>
Access simulation results using read_data() for Policy
In case of policy, for every single iteration, two output datasets are created
input_current__entity: Contains input information at offer leveloutput_current__entity: Contains of final and intermediate outputs information at offer level.
Both these datasets can be accessed using read_data.
Contents of input_current__entity and output_current__entity will be discussed in details in "3.b Running Simulation"
# Entity Data for a sample Policy simulation
from corridor import read_data
input_current__entity = read_data("s3a://corridor.dev/lake/tmp_devqa/sim_SIM_3482_job_15329_ent_261_input.parquet", "corridor_api.infrastructure.data_source_handler.parquet_folder.ParquetFolder") # noqa: E501
# Converting to pnadas and displaying top 10 rows
input_current__entity.limit(5).toPandas()
| id__entity | requested_loan_amount | debt_capacity | fico_range_high | pd_model_ver1 | id__offer | |
|---|---|---|---|---|---|---|
| 0 | 251539607593 | 21000.0 | 0.323077 | 704.0 | 0.15 | 25153960759321000.07.9936.0 |
| 1 | 251539607593 | 21000.0 | 0.323077 | 704.0 | 0.15 | 25153960759321000.010.9936.0 |
| 2 | 251539607593 | 21000.0 | 0.323077 | 704.0 | 0.15 | 25153960759322000.07.9936.0 |
| 3 | 251539607593 | 21000.0 | 0.323077 | 704.0 | 0.15 | 25153960759322000.010.9936.0 |
| 4 | 251539607624 | 23200.0 | 0.580000 | 709.0 | 0.20 | 25153960762421200.07.9936.0 |
# Offer Data for the same sample Policy simulation as above
from corridor import read_data
output_current__entity = read_data("s3a://corridor.dev/lake/tmp_devqa/sim_SIM_3482_job_15329_ent_261_output.parquet", "corridor_api.infrastructure.data_source_handler.parquet_folder.ParquetFolder") # noqa: E501
# Converting to pnadas and displaying top 10 rows
output_current__entity.limit(5).toPandas()
| id__entity | output__entity | output__strategy__2 | output__strategy__1 | output__segment__2_2 | output__segment__2_1 | output__rule__1_1_1 | output__rule__2_1_2 | output__rule__2_2_3 | output__rule__2_1_3 | output__rule__2_2_1 | output__rule__2_1_1 | output__rule__2_2_2 | output__config__potential_loan_amount | output__config__potential_loan_term | output__config__potential_int_rate | profiling_info | id__offer | output__offer | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 251539607593 | False | False | True | False | True | True | False | None | True | None | False | None | 21000.0 | 36.0 | 7.99 | M_FICO | 25153960759321000.07.9936.0 | False |
| 1 | 251539607593 | False | False | True | False | True | True | True | None | True | None | False | None | 21000.0 | 36.0 | 10.99 | M_FICO | 25153960759321000.010.9936.0 | False |
| 2 | 251539607593 | False | False | True | False | True | True | False | None | True | None | False | None | 22000.0 | 36.0 | 7.99 | M_FICO | 25153960759322000.07.9936.0 | False |
| 3 | 251539607593 | False | False | True | False | True | True | True | None | True | None | False | None | 22000.0 | 36.0 | 10.99 | M_FICO | 25153960759322000.010.9936.0 | False |
| 4 | 251539607624 | False | False | True | False | True | True | False | None | False | None | False | None | 21200.0 | 36.0 | 7.99 | M_FICO | 25153960762421200.07.9936.0 | False |
User action in case of Error logs with Jobs¶
In this section, we illustrate possible user actions if a simulation/ comparison/ validation job shows error log. We take an example of policy simulation. Similar action applies in other cases (model validation, feature simulation etc.)
# Import Policy from Corridor
from corridor import Policy
# Load a registered Policy
Policy_example = Policy('UW Policy with PD Model and Framework - Strict')
# Policy strategy structure
for strategy in Policy_example.strategies:
print(f'strategy: {strategy.name}')
if strategy.segments:
for seg in strategy.segments:
print(f' segment: {seg.name}')
for rule in seg.rules:
print(f' rule: {rule.name}')
else:
for rule in strategy.rules:
print(f' rule: {rule.name}')
strategy: Min. Eligibility Requirement rule: Min FICO & Max Debt Capacity strategy: Loan Approval Strategy segment: 680 < FICO < 780 rule: Loan Amount rule: Loan Pricing rule: PD Threshold segment: FICO >= 780 rule: Loan Amount rule: Loan Pricing rule: PD Threshold
# Available jobs
Policy_example.jobs
['comparison_Dec-01-2020 12:20 PM', 'simulation_Dec-01-2020 12:18 PM', 'simulation_Nov-30-2020 06:42 PM', 'comparison_Nov-26-2020 11:47 AM', 'simulation_Nov-26-2020 11:43 AM']
# Access details for a specific job
job = Policy_example.get_job("simulation_Nov-30-2020 06:42 PM")
# Checking if the job ran successfully without any log
print(f'status: {job.status}')
print(f'logs: {job.logs}')
status: COMPLETED logs: ********** ERRORS ********** [ERROR] application_id__reserved_xgpajewcuspxexsl=251539892330: AssertionError: In strategy ' Loan Approval Strategy', either no segment passed or no relevant rules were found in the segments that were applicable to the record [ERROR] application_id__reserved_xgpajewcuspxexsl=20000072968: AssertionError: In strategy ' Loan Approval Strategy', either no segment passed or no relevant rules were found in the segments that were applicable to the record [ERROR] application_id__reserved_xgpajewcuspxexsl=20000123361: AssertionError: In strategy ' Loan Approval Strategy', either no segment passed or no relevant rules were found in the segments that were applicable to the record [ERROR] application_id__reserved_xgpajewcuspxexsl=251539703079: AssertionError: In strategy ' Loan Approval Strategy', either no segment passed or no relevant rules were found in the segments that were applicable to the record [ERROR] application_id__reserved_xgpajewcuspxexsl=268719495752: AssertionError: In strategy ' Loan Approval Strategy', either no segment passed or no relevant rules were found in the segments that were applicable to the record [ERROR] application_id__reserved_xgpajewcuspxexsl=268719486971: AssertionError: In strategy ' Loan Approval Strategy', either no segment passed or no relevant rules were found in the segments that were applicable to the record [ERROR] application_id__reserved_xgpajewcuspxexsl=251539856456: AssertionError: In strategy ' Loan Approval Strategy', either no segment passed or no relevant rules were found in the segments that were applicable to the record [ERROR] application_id__reserved_xgpajewcuspxexsl=242949741887: AssertionError: In strategy ' Loan Approval Strategy', either no segment passed or no relevant rules were found in the segments that were applicable to the record [ERROR] application_id__reserved_xgpajewcuspxexsl=217180135769: AssertionError: In strategy ' Loan Approval Strategy', either no segment passed or no relevant rules were found in the segments that were applicable to the record
The error message above indicates that for certain application ids (alias : application_id__reserved_xgpajewcuspxexsl), in 'Loan Approval Strategy' strategy, either no segment passed or no relevant rules were found in the segments that were applicable to the record..
There are 3 possible next steps at this stage:
- Adjust Policy strategy / segment definition/ rule : In this example, this means adjusting 'Loan Approval Strategy' to include additional segments to have more coverage for application data.
- Do Nothing : If the error message shown in log is a known issue, user may choose to do nothing.
- Add filters while running the job : If the error message shown in log is a known issue and user wants to eliminate log generation, appropriate filters can be added while adding the simulation / comparison / validation.