Customizing Report¶
Overview
The reports seen in the dashboards of the platform can be configured as needed. The following 2 steps are required to build new custom report. - Create the custom report - Add the created report to a dashboard in the platform
Summary
A class based API is exposed by Corridor for report creation. In the API, each report is represented by a python class. The following 4 steps can be followed to create a custom report:
- Step 1: Create report class - Create a python class and inherit from corresponding base report class
- Step 2: Specify the name for the report -
nameis used to refer report in the platform - 3: Define computation logic in the class - Using the
run_computation()andrun_computation_combined()functions, calculate the reporting metrics - Step 4: Define visualization logic in the class - Using
get_visualization()andget_visualization_combined()functions, specify the visualization of the reporting metrics using plotly
Step 1: Create report class
Corridor report API exposes a set of base classes for report writer. The base classes provide predefined functionalities that can be accessed by newly defined custom reports. To access the functionalities defined in the base classes, custom report can simply inherit from the desired base class.
Example
from corridor_api.config.reports import BaseModelReport
class ModelExampleReport(BaseModelReport):
# This is an example report class for models
ModelExampleReport is defined to represent the custom report we are creating.
ModelExampleReport inherits from BaseModelReport which is the base class defined in the corridor report API for model.
The inheritance ensures that:
ModelExampleReportis a report created for modelModelExampleReportcan access all the predefined functionalities inBaseModelReport. For instance, accessing information for model dependent variable
The following base classes are exposed for building reports for different entities on the platform:
BaseQualityProfileReportBaseDataElementReportBaseFeatureReportBaseModelReportBaseModelMetricBaseDatasetReportBaseExperimentReportBasePolicyReport
Step 2: Specify the name variable
name defines the name that will be used to refer to the report throughout the platform. The name should be unique to each report
Example
from corridor_api.config.reports import BaseModelReport
class ModelExampleReport(BaseModelReport):
name = 'model_example_report'
In the code snippet above, we specified the name for ModelExampleReport as model_example_report. Model_example_report will be used to refer to the report created by ModelExampleReportclass. For instance, to add the created report to the UI, thename` model_example_report will be used for reference.
Step 3: Define computation logic
run_computation() and run_computation_combined() define the logic for computing the reporting metrics. For example, logic to compute quantiles, method of binning, histogram calculations, etc.
On the platform, for model, we can run Simulation, Comparison or Validation. In case of Comparison, reports are needed for 2 models, one for the current model and one for the challenger model. In case of Validation, reports are needed for 2 simulations, one for the current simulation and one for the benchmark simulation. This requires the platform to handle scenarios in which reporting metrics will be calculated and visualized more than once for different models or different datasets. Hence, we created both run_computation() and run_computation_combined() function.
The run_computation() function has access to the metadata and data for a single model or simulation. Therefore it defines the reporting metrics calculation for a single set of data
Meanwhile the run_computation_combined() function has access to metadata_dict and data_dict. Both of them are of python dictionary type. They contain metadata and data for current, challenger model or current, benchmark simulation. run_computation_combined() defines reporting metrics calculation for multiple sets of metadata and data combination.
Example of run_computation()
from corridor_api.config.reports import BaseModelReport
class ModelExampleReport(BaseModelReport):
name = 'model_example_report'
def run_computation(self, metadata, data):
# Logic to use the metadata information to process the data
score_col = metadata['scoreinfo']['colname']
actual_col = metadata['actualinfo']['colname']
computation_result = data.agg(
F.mean(data[score_col]).alias('score'),
F.mean(data[actual_col]).alias('actual'),
).toPandas()
return computation_result
In the code snippet above, run_computation() is defined to calculate the mean for score_col(predicted) and actual_col(actual) using data
Note:
run_computation() can access 2 parameters:
-
metadata: Python Dictionary
Contains information about the object and the simulation that's been run for the object. The information can be provided through base class. For instance, onceModelExampleReportinherits fromBaseModelReport, the following metadata is accessible forModelExampleReport:metadata = { 'name': model_name, 'type': model_type, 'scoreinfo': { 'colname': model_output_feature_colname, 'coltype': model_output_feature_type, 'name': model_output_feature_name, }, 'primary_key': { 'name': platform_entity_id_name, # for instance `application_id` if the model is on `Application` entity 'alias': platform_entity_id_name, 'colname': platform_entity_id_name, }, 'actualinfo': { 'colname': model_dependent_variable_colname, 'coltype': model_dependent_variable_type, 'name': model_dependent_variable_name, }, } -
data: Python Dataframe or Pyspark Dataframe
Data to run the report.
Example of run_computation_combined()
from corridor_api.config.reports import BaseModelReport
class ModelExampleReport(BaseModelReport):
name = 'model_example_report'
def run_computation(self, metadata, data):
# Logic to use the metadata information to process the data
score_col = metadata['scoreinfo']['colname']
actual_col = metadata['actualinfo']['colname']
computation_result = data.agg(
F.mean(data[score_col]).alias('score'),
F.mean(data[actual_col]).alias('actual'),
).toPandas()
return computation_result
def run_computation_combined(self, metadata_dict, data_dict):
computation_result_multiple_data = {
label: self.run_computation(metadata, data_dict[label])
for label, metadata in metadata_dict.items()
}
return computation_result_multiple_data
In the code snippet above, run_computation_combined() calls run_computation() for
each of the metadata, data combination from metadata_dict and data_dict
In fact, the run_computation_combined() definition is defined in the base class. So in the custom report class, only run_computation() needs to be defined. In case of Model Comparison and Validation, run_computation_combined() logic defined in the base class will be used and run_computation() will be called for the current, challenger or benchmark dataset.
Note
run_computation_combined() can access 2 parameters:
-
metadata_dict: Dictionary of Metadata Contains metadata for
current,challengerorbenchmarkbased on the job type.current_metadata = { 'name': model_name, 'type': model_type, 'scoreinfo': { 'colname': model_output_feature_colname, 'coltype': model_output_feature_type, 'name': model_output_feature_name, }, 'primary_key': { 'name': platform_entity_id_name, # for instance `application_id` if the model is on `Application` entity 'alias': platform_entity_id_name, 'colname': platform_entity_id_name, }, 'actualinfo': { 'colname': model_dependent_variable_colname, 'coltype': model_dependent_variable_type, 'name': model_dependent_variable_name, }, } challenger_metadata = {} # same structure as current_metadata metadata_dict = {'current': current_metadata, 'challenger': challenger_metadata} -
data_dict: Dictionary of data Contains data for
current,challengerorbenchmarkbased on the job type.
Step 4: Define visualization logic
get_visualization() and get_visualization_combined() define how the report should be visualized. For example: Scatter plot, bar chart, etc. Similarly, get_visualization() defines visualization for one set of metadata and data combination. get_visualization_combined() handles visualizations for multiple metadata and data combination.
Example for get_visualization()
import plotly.graph_objs as go
from corridor_api.config.reports import BaseModelReport
class ModelExampleReport(BaseModelReport):
name = 'model_example_report'
def run_computation(self, metadata, data):
# Logic to use the metadata information to process the data
score_col = metadata['scoreinfo']['colname']
actual_col = metadata['actualinfo']['colname']
computation_result = data.agg(
F.mean(data[score_col]).alias('score'),
F.mean(data[actual_col]).alias('actual'),
).toPandas()
return computation_result
def get_visualization(self, metadata, computation_result):
return go.Figure(
data=go.Scatter(x=computation_result['score'],
y=computation_result['actual'],
name='Actual vs Pred',
),
layout=go.Layout(title={'text': 'Actual vs Pred'},
xaxis={'title': {'text': 'Score'}},
yaxis={'title': {'text': 'Actual'}},
),
)
In the code snippet above, get_visualization() is defined to build a scatter plot for the mean of score and actual. It returns the visualization as a plotly figure.
Note
get_visualization() can access 2 parameters:
- metadata: python dictionary
Same as
run_computation() - computation_result:
Result returned from
run_computation()
Example for `get_visualization_combined()
import plotly.graph_objs as go
from plotly.subplots import make_subplots
from corridor_api.config.reports import BaseModelReport
class ModelExampleReport(BaseModelReport):
name = 'model_example_report'
def run_computation_combined(self, metadata_dict, data_dict):
result_dict = {
label: self.run_computation(metadata, data_dict[label])
for label, metadata in metadata_dict.items()
}
return result_dict
def get_visualization_combined(self, metadata_dict, result_dict):
figure = make_subplots(rows=1, cols=len(metadata_dict))
for label in metadata_dict.keys():
figure.add_trace(
go.Scatter(
x=result_dict[label]['score'],
y=result_dict[label]['actual'],
name=f'Actual vs Pred for {label}',
)
),
return figure
In the code snippet above, get_visualization_combined() created scatter plot for each of the computation result in result_dict, then used make_subplots() to join all the scatter plots together and returned the combined visualization as a plotly figure.
To create custom report, get_visualization_combined() is required.
Note
get_visualization_combined() can access 2 parameters:
- metadata_dict: python dictionary
Same as
run_computation_combined() - result_dict:
result_dict returned from
run_computation_combined()
Additional control options
The above 4 steps are the minimum steps required to create a custom report. Apart from that, addtional variables and functions can be added to the report class for more control of the report writing. Below are the additional variables, functions which will be covered in this tutorial:
-
Step 1: Define language that the report is written in - The
supported_lang_specsvariable specifies the supported language for the report i.e. pandas or spark -
Step 2: Control for which objects the report should be used for -
get_is_skipped()specifies whether the report should be run, for instance based on model type -
Step 3: Select metadata to use in computation/visualization -
get_metadata()provides metadata information for the calculation, visualization of the reporting metrics
Extra Step 1: Define language of report
supported_lang_specs specifies the supported language for the report
Example
from corridor_api.config.reports import BaseModelReport
class ModelExampleReport(BaseModelReport):
supported_lang_specs = ('pyspark-dataframe', 'pandas-dataframe')
In the example above, supported_lang_specs is set to be ('pyspark-dataframe', 'pandas-dataframe'), it defines that ModelExampleReport can be run on both pyspark dataframe and pandas dataframe. Because the report should be able to run on both pyspark dataframe and pandas dataframe, in run_computation(), logics should be defined to handle pyspark dataframe and pandas dataframe.
Most of the built-in reports on the platform support both pyspark dataframe and pandas dataframe
Extra Step 2: Control report usage
get_is_skipped() defines the logic of whether the report should be run.
If get_is_skipped() returns True, the report will be skipped, hence it won't be run. If it returns False, the corresponding report will be run.
The logic can be specified based on information about the reporting object.
Example
from corridor_api.config.reports import BaseModelReport
class ModelExampleReport(BaseModelReport):
def get_is_skipped(self, entity):
if entity.type == 'Binary Classification':
return True
return False
In the example above, ModelExampleReport will be skipped if the model type is Binary Classification
Note
get_is_skipped() can access 1 parameter:
- entity: Corridor Object
The reporting object.entityis of the type Corridor Object, it has all the attributes that are defined in the Corridor package
Extra Step 3: Select metadata to use
get_metadata() can be used to add information in the metadata, which can later be accessed in
run_computation(), run_computation_combined() and get_visualization(), get_visualization_combined().
By inheriting from base class, a set of information is predefined in the metadata. For instance for model report, we can access information like model type, model name, depedent variable name, dependent variable type etc. For most of the report, the predefined metadata is suffice. In case that additional information is needed in reporting metrics calculation or visualization, we can define get_metadata() in the report class.
Example
from corridor_api.config.reports import BaseModelReport
class ModelExampleReport(BaseModelReport):
def get_metadata(self, entity):
metadata = super().get_metadata(entity)
if entity.type == 'Regression':
metadata['num_inputs'] = len(entity.inputs)
return metadata
In the above example, metadata = super().get_metadata(entity) is been used to get the default metadata that's defined in the base class. Then, num_inputs is been added to the metadata for regression models.
Note
get_metadata() can access 1 parameter:
- entity: Corridor Object, the reporting object.
entityis of the type Corridor Object, it has all the attributes that are defined in Corridor package