Customizing Report¶

Overview

The reports seen in the dashboards of the platform can be configured as needed. The following 2 steps are required to build new custom report. - Create the custom report - Add the created report to a dashboard in the platform

Summary

A class based API is exposed by Corridor for report creation. In the API, each report is represented by a python class. The following 4 steps can be followed to create a custom report:

Step 1: Create report class - Create a python class and inherit from corresponding base report class
Step 2: Specify the name for the report - name is used to refer report in the platform
3: Define computation logic in the class - Using the run_computation() and run_computation_combined() functions, calculate the reporting metrics
Step 4: Define visualization logic in the class - Using get_visualization() and get_visualization_combined() functions, specify the visualization of the reporting metrics using plotly

Step 1: Create report class

Corridor report API exposes a set of base classes for report writer. The base classes provide predefined functionalities that can be accessed by newly defined custom reports. To access the functionalities defined in the base classes, custom report can simply inherit from the desired base class.

Example

from corridor_api.config.reports import BaseModelReport

class ModelExampleReport(BaseModelReport):
    # This is an example report class for models

In the code snippet above, python class ModelExampleReport is defined to represent the custom report we are creating.

ModelExampleReport inherits from BaseModelReport which is the base class defined in the corridor report API for model.

The inheritance ensures that:

ModelExampleReport is a report created for model
ModelExampleReport can access all the predefined functionalities in BaseModelReport. For instance, accessing information for model dependent variable

The following base classes are exposed for building reports for different entities on the platform:

BaseQualityProfileReport
BaseDataElementReport
BaseFeatureReport
BaseModelReport
BaseModelMetric
BaseDatasetReport
BaseExperimentReport
BasePolicyReport

Step 2: Specify the name variable

name defines the name that will be used to refer to the report throughout the platform. The name should be unique to each report

Example

from corridor_api.config.reports import BaseModelReport

class ModelExampleReport(BaseModelReport):
    name = 'model_example_report'

In the code snippet above, we specified the name for ModelExampleReport as model_example_report. Model_example_report will be used to refer to the report created by ModelExampleReportclass. For instance, to add the created report to the UI, thename` model_example_report will be used for reference.

Step 3: Define computation logic

run_computation() and run_computation_combined() define the logic for computing the reporting metrics. For example, logic to compute quantiles, method of binning, histogram calculations, etc.

On the platform, for model, we can run Simulation, Comparison or Validation. In case of Comparison, reports are needed for 2 models, one for the current model and one for the challenger model. In case of Validation, reports are needed for 2 simulations, one for the current simulation and one for the benchmark simulation. This requires the platform to handle scenarios in which reporting metrics will be calculated and visualized more than once for different models or different datasets. Hence, we created both run_computation() and run_computation_combined() function.

The run_computation() function has access to the metadata and data for a single model or simulation. Therefore it defines the reporting metrics calculation for a single set of data
Meanwhile the run_computation_combined() function has access to metadata_dict and data_dict. Both of them are of python dictionary type. They contain metadata and data for current, challenger model or current, benchmark simulation. run_computation_combined() defines reporting metrics calculation for multiple sets of metadata and data combination.

Example of run_computation()

from corridor_api.config.reports import BaseModelReport

class ModelExampleReport(BaseModelReport):
    name = 'model_example_report'

    def run_computation(self, metadata, data):
        # Logic to use the metadata information to process the data
        score_col = metadata['scoreinfo']['colname']
        actual_col = metadata['actualinfo']['colname']
        computation_result = data.agg(
            F.mean(data[score_col]).alias('score'),
            F.mean(data[actual_col]).alias('actual'),
        ).toPandas()
        return computation_result

In the code snippet above, run_computation() is defined to calculate the mean for score_col(predicted) and actual_col(actual) using data

Note:

run_computation() can access 2 parameters:

metadata: Python Dictionary
Contains information about the object and the simulation that's been run for the object. The information can be provided through base class. For instance, once ModelExampleReport inherits from BaseModelReport, the following metadata is accessible for ModelExampleReport:

metadata = {
    'name': model_name,
    'type': model_type,
    'scoreinfo': {
        'colname': model_output_feature_colname,
        'coltype': model_output_feature_type,
        'name': model_output_feature_name,
    },
    'primary_key': {
        'name': platform_entity_id_name,  # for instance `application_id` if the model is on `Application` entity
        'alias': platform_entity_id_name,
        'colname': platform_entity_id_name,
    },
    'actualinfo': {
        'colname': model_dependent_variable_colname,
        'coltype': model_dependent_variable_type,
        'name': model_dependent_variable_name,
    },
}

data: Python Dataframe or Pyspark Dataframe
Data to run the report.

Example of run_computation_combined()

from corridor_api.config.reports import BaseModelReport

class ModelExampleReport(BaseModelReport):
    name = 'model_example_report'

    def run_computation(self, metadata, data):
        # Logic to use the metadata information to process the data
        score_col = metadata['scoreinfo']['colname']
        actual_col = metadata['actualinfo']['colname']
        computation_result = data.agg(
            F.mean(data[score_col]).alias('score'),
            F.mean(data[actual_col]).alias('actual'),
        ).toPandas()
        return computation_result

    def run_computation_combined(self, metadata_dict, data_dict):
        computation_result_multiple_data = {
            label: self.run_computation(metadata, data_dict[label])
            for label, metadata in metadata_dict.items()
        }
        return computation_result_multiple_data

In the code snippet above, run_computation_combined() calls run_computation() for
each of the metadata, data combination from metadata_dict and data_dict

In fact, the run_computation_combined() definition is defined in the base class. So in the custom report class, only run_computation() needs to be defined. In case of Model Comparison and Validation, run_computation_combined() logic defined in the base class will be used and run_computation() will be called for the current, challenger or benchmark dataset.

Note

run_computation_combined() can access 2 parameters:

metadata_dict: Dictionary of Metadata Contains metadata for current, challenger or benchmark based on the job type.

current_metadata = {
    'name': model_name,
    'type': model_type,
    'scoreinfo': {
        'colname': model_output_feature_colname,
        'coltype': model_output_feature_type,
        'name': model_output_feature_name,
    },
    'primary_key': {
        'name': platform_entity_id_name,  # for instance `application_id` if the model is on `Application` entity
        'alias': platform_entity_id_name,
        'colname': platform_entity_id_name,
    },
    'actualinfo': {
        'colname': model_dependent_variable_colname,
        'coltype': model_dependent_variable_type,
        'name': model_dependent_variable_name,
    },
}

challenger_metadata = {} # same structure as current_metadata

metadata_dict = {'current': current_metadata, 'challenger': challenger_metadata}

data_dict: Dictionary of data Contains data for current, challenger or benchmark based on the job type.
```
data_dict = {'current': current_data, 'challenger': challenger_data}
```

Step 4: Define visualization logic

get_visualization() and get_visualization_combined() define how the report should be visualized. For example: Scatter plot, bar chart, etc. Similarly, get_visualization() defines visualization for one set of metadata and data combination. get_visualization_combined() handles visualizations for multiple metadata and data combination.

Example for get_visualization()

import plotly.graph_objs as go
from corridor_api.config.reports import BaseModelReport

class ModelExampleReport(BaseModelReport):
    name = 'model_example_report'

    def run_computation(self, metadata, data):
        # Logic to use the metadata information to process the data
        score_col = metadata['scoreinfo']['colname']
        actual_col = metadata['actualinfo']['colname']
        computation_result = data.agg(
            F.mean(data[score_col]).alias('score'),
            F.mean(data[actual_col]).alias('actual'),
        ).toPandas()
        return computation_result

    def get_visualization(self, metadata, computation_result):
        return go.Figure(
            data=go.Scatter(x=computation_result['score'],
                y=computation_result['actual'],
                name='Actual vs Pred',
            ),
            layout=go.Layout(title={'text': 'Actual vs Pred'},
                xaxis={'title': {'text': 'Score'}},
                yaxis={'title': {'text': 'Actual'}},
            ),
        )

In the code snippet above, get_visualization() is defined to build a scatter plot for the mean of score and actual. It returns the visualization as a plotly figure.

Note

get_visualization() can access 2 parameters:

metadata: python dictionary Same as run_computation()
computation_result: Result returned from run_computation()

Example for `get_visualization_combined()

import plotly.graph_objs as go
from plotly.subplots import make_subplots
from corridor_api.config.reports import BaseModelReport

class ModelExampleReport(BaseModelReport):
    name = 'model_example_report'

    def run_computation_combined(self, metadata_dict, data_dict):
        result_dict = {
            label: self.run_computation(metadata, data_dict[label])
            for label, metadata in metadata_dict.items()
        }
        return result_dict

    def get_visualization_combined(self, metadata_dict, result_dict):
        figure = make_subplots(rows=1, cols=len(metadata_dict))

        for label in metadata_dict.keys():
            figure.add_trace(
                go.Scatter(
                    x=result_dict[label]['score'],
                    y=result_dict[label]['actual'],
                    name=f'Actual vs Pred for {label}',
                )
            ),

        return figure

In the code snippet above, get_visualization_combined() created scatter plot for each of the computation result in result_dict, then used make_subplots() to join all the scatter plots together and returned the combined visualization as a plotly figure.

To create custom report, get_visualization_combined() is required.

Note

get_visualization_combined() can access 2 parameters:

metadata_dict: python dictionary Same as run_computation_combined()
result_dict: result_dict returned from run_computation_combined()

Additional control options

The above 4 steps are the minimum steps required to create a custom report. Apart from that, addtional variables and functions can be added to the report class for more control of the report writing. Below are the additional variables, functions which will be covered in this tutorial:

Step 1: Define language that the report is written in - The supported_lang_specs variable specifies the supported language for the report i.e. pandas or spark
Step 2: Control for which objects the report should be used for - get_is_skipped() specifies whether the report should be run, for instance based on model type
Step 3: Select metadata to use in computation/visualization - get_metadata() provides metadata information for the calculation, visualization of the reporting metrics

Extra Step 1: Define language of report

supported_lang_specs specifies the supported language for the report

Example

from corridor_api.config.reports import BaseModelReport

class ModelExampleReport(BaseModelReport):
    supported_lang_specs = ('pyspark-dataframe', 'pandas-dataframe')

In the example above, supported_lang_specs is set to be ('pyspark-dataframe', 'pandas-dataframe'), it defines that ModelExampleReport can be run on both pyspark dataframe and pandas dataframe. Because the report should be able to run on both pyspark dataframe and pandas dataframe, in run_computation(), logics should be defined to handle pyspark dataframe and pandas dataframe.

Most of the built-in reports on the platform support both pyspark dataframe and pandas dataframe

Extra Step 2: Control report usage

get_is_skipped() defines the logic of whether the report should be run.

If get_is_skipped() returns True, the report will be skipped, hence it won't be run. If it returns False, the corresponding report will be run.

The logic can be specified based on information about the reporting object.

Example

from corridor_api.config.reports import BaseModelReport

class ModelExampleReport(BaseModelReport):
    def get_is_skipped(self, entity):
        if entity.type == 'Binary Classification':
            return True
        return False

In the example above, ModelExampleReport will be skipped if the model type is Binary Classification

Note

get_is_skipped() can access 1 parameter:

entity: Corridor Object
The reporting object. entity is of the type Corridor Object, it has all the attributes that are defined in the Corridor package

Extra Step 3: Select metadata to use

get_metadata() can be used to add information in the metadata, which can later be accessed in run_computation(), run_computation_combined() and get_visualization(), get_visualization_combined().

By inheriting from base class, a set of information is predefined in the metadata. For instance for model report, we can access information like model type, model name, depedent variable name, dependent variable type etc. For most of the report, the predefined metadata is suffice. In case that additional information is needed in reporting metrics calculation or visualization, we can define get_metadata() in the report class.

Example

from corridor_api.config.reports import BaseModelReport

class ModelExampleReport(BaseModelReport):
    def get_metadata(self, entity):
        metadata = super().get_metadata(entity)
        if entity.type == 'Regression':
            metadata['num_inputs'] = len(entity.inputs)
        return metadata

In the above example, metadata = super().get_metadata(entity) is been used to get the default metadata that's defined in the base class. Then, num_inputs is been added to the metadata for regression models.

Note

get_metadata() can access 1 parameter:

entity: Corridor Object, the reporting object. entity is of the type Corridor Object, it has all the attributes that are defined in Corridor package