Creating custom reports on platform
Specification of required Custom Report¶
- In this notebook, we will illustrate a
modelobject report. - We will show actual vs predicted for any model object.
- There are no special conditions applicable - this report should be available for all types of model.
The reports seen in the dashboards on corridor can be configured as needed. This is done in a 2 step process:
1. Define report details: There are 3 components in this step
a. Specify requirement: Users must clearly define the report requirement at the outset. This includes:
* what report they want to create, eg. Feature report;
* what metrics they want to show, eg. Feature trend over time;
* under what conditions report should be available, eg. feature report applicable only for numerical variables
b. Define metrics for required visualization: Define what metrics to visualize by defining run_computation(metadata, data) function.
run_computation(metadata, data) returns computation result using two sets of inputs:
metadata: metadata contains information about the object and the simulation that has been run for the object. The information can be provided through base class used to define the report class for new report.
data: Data to run the report. This could be Python Dataframe or Pyspark Dataframe
c. Define how to visualize the defined metrics: Define visual specification like chart type, number of charts to visualize etc in get_visualization(metadata, result)
get_visualization(metadata, result) returns a plotly visualization using two sets of inputs:
metadata: this is a python dictionary - same as run_computation()
computation_result: computation_result - result returned from run_computation()
To set up custom report on platform, we just need to define a report class with run_computation(metadata, data) and get_visualization(metadata, result). In this notebook, the primary objective is to illustrate how to create these two functions.
2. Integrate custom report into the platform: There are 3 components in this step:
a. Inherit class
b. Create report class name
c. get_visualization_combined(metadata_dict, result_dict)
Data Preparation: data & metadata¶
To illustrate report creation, we prepare dummy data and a corresponding metadata.
Actual metadata will be available in the same structure as below for a model report once we create a report class by inheriting appropriate base class. Actual data is the data from job run.
Note: Data Preparation shown here is a temporary step for illustration purpose.
Metadata: Every object has metadata on platform. Object metadata contains all the information associated with the object and the simulations that has been run. For instance, Model object's metadata contains information about model type, dependent variable, platform entity etc.Metadata can be extended to include additional information if required.
# Example model metadata
metadata = {
'name': 'Income Estimation',
'type': 'Regression',
'scoreinfo': {
'colname': 'predicted_income',
'coltype': 'Numerical',
'name': 'Predicted Income',
},
'primary_key': {
'name': 'application',
'alias': 'corridor_application_id',
'colname': 'corridor_application_id',
},
'actualinfo': {
'colname': 'annual_income',
'coltype': 'Numerical',
'name': 'Annual Income',
},
}
# Import required packages
import pandas as pd
import random
# Creating dummy data
df = pd.DataFrame({'annual_income': [10000 + x for x in random.sample(range(0,500), 100)],
'predicted_income': [10000 + x for x in random.sample(range(0,500), 100)]})
df.head(5)
| annual_income | predicted_income | |
|---|---|---|
| 0 | 10464 | 10467 |
| 1 | 10236 | 10488 |
| 2 | 10499 | 10455 |
| 3 | 10127 | 10000 |
| 4 | 10227 | 10457 |
Building run_computation() and get_visualization()¶
# Define run_computation().
# Note that the input parameters are metadata and data. Output is computation_result.
def run_computation(metadata, data):
score_col = metadata['scoreinfo']['colname']
actual_col = metadata['actualinfo']['colname']
# Dataframe containing information/calculations required for plots. This will be passed on to get_visualization()
computation_result = pd.DataFrame({'actual': df[actual_col],
'predicted': df[score_col],
'error': (df[actual_col] - df[score_col]) / df[actual_col].mean()})
return computation_result
# Execute 'run_computation' to check logic
computation_result = run_computation(metadata, df)
computation_result.head(5)
| actual | predicted | error | |
|---|---|---|---|
| 0 | 10464 | 10467 | -0.000293 |
| 1 | 10236 | 10488 | -0.024572 |
| 2 | 10499 | 10455 | 0.004290 |
| 3 | 10127 | 10000 | 0.012383 |
| 4 | 10227 | 10457 | -0.022427 |
# Define get_visualization().
# Note that the input parameters are metadata and computation_result (output of run_computation()). Output of
# get_visualization() is a plotly figure object
def get_visualization(metadata, computation_result):
import plotly.graph_objects as go
from plotly.subplots import make_subplots
figure = make_subplots(rows=1, cols=2, subplot_titles=("Actual vs Pred", "Error Plot"),
specs=[[{"type": "scatter"}, {"type": "scatter"}]])
figure.add_trace(go.Scatter(x=computation_result['predicted'], y=computation_result['actual'],mode='markers'),
row=1, col=1)
figure.add_trace(go.Scatter(x=list(range(1, len(computation_result['predicted'])+1)),y=computation_result['error']),
row=1, col=2)
figure.update_xaxes(title_text="Actual", row=1, col=1)
figure.update_yaxes(title_text="Predicted", row=1, col=1)
figure.update_xaxes(title_text="Sample", row=1, col=2)
figure.update_yaxes(title_text="Error", row=1, col=2)
figure.update_layout(
title = {'text':"Regression Model Reports", 'y':0.9,'x':0.45,'xanchor': 'center','yanchor': 'top' },
height=450, showlegend=False)
return figure
# Execute 'get_visualization' to check logic
get_visualization(metadata, computation_result)
Integrate custom report into the platform¶
There are 3 additional steps required to make run_computation() & get_visualization() ready for integration report into the platform:
- Inherit Class : Corridor report API exposes a set of base classes for report writer. The base classes provide predefined functionalities that can be accessed by newly defined custom reports.To access the functionalities defined in the base classes, custom report can simply inherit from the desired base class. Available base calsses are:
- BaseQualityProfileReport
- BaseDataElementReport
- BaseFeatureReport
- BaseModelReport
- BaseModelMetric
- BaseDatasetReport
- BaseExperimentReport
- BasePolicyReport
Since we are creating model report in this illustration, ModelExampleReport (output report class) will inherit BaseModelReport
Create report class name : name defines the name that will be used to refer to the report throughout the platform name should be unique to each report. In this example, we specify the name for
ModelExampleReportasmodel_example_report.
example_model_report will be used to refer to the report created by ModelExampleReport class. For instance, to add the created report to the UI, the name model_example_report will be used for reference.get_visualization_combined(metadata_dict, result_dict) : get_visualization() defines visualization for one set of metadata and data combination whereas get_visualization_combined() handles visualizations for multiple metadata and data combination.
Additional Notes:
We can have extra control of the report writing by adding get_is_skipped() and get_metadata() to the report class we are creating. get_is_skipped() lets us skip the report based on any Model metadata. For instance, to skip the report for any Binary Classification models. get_metadata() lets us define extra information we might need if it's not defined in the metadata of the parent class. For instance, we might want to access the number of inputs for the Model.
Both get_is_skipped() and get_metadata() can access the entity parameter, which is the entity the simulation is running for. The entity would be the type of the corresponding Corridor objects defined in the Corridor package. For instance in our example, we are writing the reports for Model, therefore, the entity would be a Model object defined in the Corridor package, you can access all the metadata exposed in the Corridor package for Model through this Model object.
get_is_skipped() and get_metadata() are only needed when you want to add the additional layer of control for reports writing.
get_is_skipped(): In this section, users can add logic to skip report generation. This can be done using available entity information. Users can choose to skip report generation for specified ModelType or Job types. In ModelExampleReport shown here, report is skipped when ModelType is not Regression and the execution_type is not Simulation. This implies report is applicable only for Simulation type Jobs of Regression Models.get_metadata(): Report writers may want to access entity metadata. This information is handled in get_metadata() section of report class. In sample report presented here, we can inherit all metadata info from parent classBaseModelReportwhich includes - name: The name of the model as registered in Model Studio - type: The type fo the model - Classification, Regression, etc. - scoreinfo: A dictionary containing info about the score/predicted/output column: - colname: The column name of the score in the data - coltype: The column type of the score <Numerical|Categorical|...> - name: The name of the score feature as registered in Model Studio (auto generated based on model name) - primary_key: The primary key for the data - name: The name for the primary key - alias: The definition-alias for the primary key - colname: The unique-alias of primary key - actualinfo: A dictionary containing info about the actual/dependent/target column: - colname: The column name of the actual in the data - coltype: The column type of the actual <Numerical|Categorical|...> - name: The name of the actual feature as registered in Model StudioIf user wants to include additional information - let's say: model inputs' metadata - this can be done in get_metadata() using metadata.update(). Note that, if user doesn't want to inherit metadata from parent class, they can rewrite get_metadata() to include only the required information.
Report Class to configure on platform: ModelExampleReport
Reports can be created either in pyspark, pandas or user can accomodate both by using if condition. However, it is recommended to use pyspark-df from efficiency perspective. Report writer need to specify language used as under:
While using pandas-df:
supported_lang_specs = ('pandas-dataframe',)While using pyspark-df:
supported_lang_specs = ('pyspark-dataframe',)When report can handle both pyspark and pandas:
supported_lang_specs = ('pyspark-dataframe', 'pandas-dataframe')
# Report class using pandas dataframe
import plotly.graph_objs as go
from plotly.subplots import make_subplots
from corridor_reports.api import BaseModelReport
class ModelExampleReport(BaseModelReport):
name = 'model_example_report'
supported_lang_specs = ('pandas-dataframe',)
def get_is_skipped(self, entity):
# Running only for Regression Simulations type Jobs
if (entity.type != ModelType.REGRESSION.value) & (entity.execution_type != ExecutionTypes.SIMULATION.value):
return True
return False
def get_metadata(self, entity):
metadata = super().get_metadata(entity)
metadata.update(
{
'inputinfos': [
{'name': i.name, 'colname': i.colname, 'coltype': i.type}
for i in entity.inputs
if not isinstance(i, (GlobalFunction, RuntimeParameter))
],
}
)
return metadata
def run_computation(self, metadata, data):
score_col = metadata['scoreinfo']['colname']
actual_col = metadata['actualinfo']['colname']
computation_result = pd.DataFrame({'actual': data[actual_col],
'predicted': data[score_col],
'error': (data[actual_col]-data[score_col]) / data[actual_col].mean()})
return computation_result
def get_visualization(self, metadata, computation_result):
figure = make_subplots(rows=1, cols=2, subplot_titles=("Actual vs Pred", "Error Plot"),
specs=[[{"type": "scatter"}, {"type": "scatter"}]])
figure.add_trace(go.Scatter(x=computation_result['predicted'], y=computation_result['actual'],mode='markers'),
row=1, col=1)
figure.add_trace(go.Scatter(x=list(range(1, len(computation_result['predicted'])+1)),y=computation_result['error']),
row=1, col=2)
figure.update_xaxes(title_text="Actual", row=1, col=1)
figure.update_yaxes(title_text="Predicted", row=1, col=1)
figure.update_xaxes(title_text="Sample", row=1, col=2)
figure.update_yaxes(title_text="Error", row=1, col=2)
figure.update_layout(
title = {'text':"Regression Model Reports", 'y':0.9,'x':0.45,'xanchor': 'center','yanchor': 'top' },
height=450, showlegend=False)
return figure
def get_visualization_combined(self, metadata_dict, result_dict):
curr_label = EntityLabels.CURRENT.value
return self.get_visualization(metadata_dict[curr_label], result_dict[curr_label])
# Report class using pyspark dataframe
import plotly.graph_objs as go
from plotly.subplots import make_subplots
from corridor_reports.api import BaseModelReport
class ModelExampleReport(BaseModelReport):
name = 'model_example_report'
supported_lang_specs = ('pyspark-dataframe',)
def get_is_skipped(self, entity):
# Running only for Regression Simulations type Jobs
if (entity.type != ModelType.REGRESSION.value) & (entity.execution_type != ExecutionTypes.SIMULATION.value):
return True
return False
def get_metadata(self, entity):
metadata = super().get_metadata(entity)
metadata.update(
{
'inputinfos': [
{'name': i.name, 'colname': i.colname, 'coltype': i.type}
for i in entity.inputs
if not isinstance(i, (GlobalFunction, RuntimeParameter))
],
}
)
return metadata
def run_computation(self, metadata, data):
score_col = metadata['scoreinfo']['colname']
actual_col = metadata['actualinfo']['colname']
computation_result = data.select(data[actual_col], data[score_col])
computation_result = computation_result.withColumn('error',
(computation_result[actual_col] - computation_result[score_col]) / F.mean(df[actual_col]))
computation_result = computation_result.toPandas()
return computation_result
def get_visualization(self, metadata, computation_result):
figure = make_subplots(rows=1, cols=2, subplot_titles=("Actual vs Pred", "Error Plot"),
specs=[[{"type": "scatter"}, {"type": "scatter"}]])
figure.add_trace(go.Scatter(x=computation_result['predicted'], y=computation_result['actual'],mode='markers'),
row=1, col=1)
figure.add_trace(go.Scatter(x=list(range(1, len(computation_result['predicted'])+1)),y=computation_result['error']),
row=1, col=2)
figure.update_xaxes(title_text="Actual", row=1, col=1)
figure.update_yaxes(title_text="Predicted", row=1, col=1)
figure.update_xaxes(title_text="Sample", row=1, col=2)
figure.update_yaxes(title_text="Error", row=1, col=2)
figure.update_layout(
title = {'text':"Regression Model Reports", 'y':0.9,'x':0.45,'xanchor': 'center','yanchor': 'top' },
height=450, showlegend=False)
return figure
def get_visualization_combined(self, metadata_dict, result_dict):
curr_label = EntityLabels.CURRENT.value
return self.get_visualization(metadata_dict[curr_label], result_dict[curr_label])
Next step after report class is created: This report class must be passed on to the tech team for configuration on platform.
Further info: Further information on Creation of Custom Report can be found here: ../registries/dashboards/reports/customizing-reports/
Testing report class in notebook¶
# Report class using pandas dataframe
import plotly.graph_objs as go
from plotly.subplots import make_subplots
# Note: Since corridor_report.api is not accessible in notebook, we drop inhertiance for testing in notebook
class ModelExampleReport:
name = 'model_example_report'
supported_lang_specs = ('pandas-dataframe',)
def get_is_skipped(self, entity):
# Running only for Regression Simulations type Jobs
if (entity.type != ModelType.REGRESSION.value) & (entity.execution_type != ExecutionTypes.SIMULATION.value):
return True
return False
def get_metadata(self, entity):
metadata = super().get_metadata(entity)
metadata.update(
{
'inputinfos': [
{'name': i.name, 'colname': i.colname, 'coltype': i.type}
for i in entity.inputs
if not isinstance(i, (GlobalFunction, RuntimeParameter))
],
}
)
return metadata
def run_computation(self, metadata, data):
score_col = metadata['scoreinfo']['colname']
actual_col = metadata['actualinfo']['colname']
computation_result = pd.DataFrame({'actual': data[actual_col],
'predicted': data[score_col],
'error': (data[actual_col]-data[score_col]) / data[actual_col].mean()})
return computation_result
def get_visualization(self, metadata, computation_result):
figure = make_subplots(rows=1, cols=2, subplot_titles=("Actual vs Pred", "Error Plot"),
specs=[[{"type": "scatter"}, {"type": "scatter"}]])
figure.add_trace(go.Scatter(x=computation_result['predicted'], y=computation_result['actual'],mode='markers'),
row=1, col=1)
figure.add_trace(go.Scatter(x=list(range(1, len(computation_result['predicted'])+1)),y=computation_result['error']),
row=1, col=2)
figure.update_xaxes(title_text="Actual", row=1, col=1)
figure.update_yaxes(title_text="Predicted", row=1, col=1)
figure.update_xaxes(title_text="Sample", row=1, col=2)
figure.update_yaxes(title_text="Error", row=1, col=2)
figure.update_layout(
title = {'text':"Regression Model Reports", 'y':0.9,'x':0.45,'xanchor': 'center','yanchor': 'top' },
height=450, showlegend=False)
return figure
def get_visualization_combined(self, metadata_dict, result_dict):
curr_label = EntityLabels.CURRENT.value
return self.get_visualization(metadata_dict[curr_label], result_dict[curr_label])
# Instantiating report class and generating computation result 'computation_res'
actual_vs_predicted = ModelExampleReport()
computation_res = actual_vs_predicted.run_computation(metadata, df)
computation_res.head()
| actual | predicted | error | |
|---|---|---|---|
| 0 | 10464 | 10467 | -0.000293 |
| 1 | 10236 | 10488 | -0.024572 |
| 2 | 10499 | 10455 | 0.004290 |
| 3 | 10127 | 10000 | 0.012383 |
| 4 | 10227 | 10457 | -0.022427 |
# Calling get_visulaization from report class
actual_vs_predicted.get_visualization(metadata, computation_res)