Shapash: Making ML Models understandable for everyone

Shapash Web Application Demo Shapash is a Python Toolkit which facilitates understanding Machine Learning models for data scientists. This makes it easy to discuss and share the model interpretationability with non-data experts: managers, business analysts and end-users… Shapash offers both visuals that are simple-to-read and a web application.

Shapash displays results with appropriate wording (preprocessing inverse/postprocessing). Shapash is useful in an operational context as it enables data scientists to use explicability from exploration to production: You can easily deploy local explainability in production to complete each of your forecasts/recommendations with a summary of the local explainability. This post will explain the key features and operation of Shapash.

The implementation of Shapash will be illustrated using a real-world use case. The elements of context are important topics. Models’ explanationability and interpretability are two of the most popular. It is the subject of many publications and open-source contributions. These contributions are not all about the same problems and issues. These techniques are used by data scientists for many purposes:

To better understand and verify their models as well as to debug them. But there’s more: Intelligibility is important for educational purposes. It is possible to discuss Intelligible Machine Learning models with non-data specialists, such as business analysts or final users. This requires a deep understanding of the topic and main driving forces behind the problem that we are modeling. Data scientists analyze global explicability and features importance to determine the role of top features in the model. Locally, they can look at individuals and outliers.

A Web app is useful in this stage because it allows them to view visualizations and graphic. It is a good idea to discuss these results with business analysts in order to validate and challenge the model. The model is now ready to be deployed in production. Once the model has been validated and deployed it gives end-users predictions. They can gain a lot from local explicability, but only if they have a means to give them a clear, concise, and useful summary.

They will find it valuable for two reasons. Transparency builds trust. Models will only trust people who understand them. Human stays in control: No model is 100% reliable. If they are able to understand and interpret the outputs of the algorithm, they may be able to overturn any algorithm recommendations if they believe they are based on inaccurate data. Shapash was created to assist data scientists in meeting these requirements.

Shapash’s key features include easy-to-read visualizations that are accessible to everyone. You can use a web app to understand the model’s functionality. This allows you to see multiple graphs and to assess how important features are to the model. It also shows where a particular feature is contributing to it.

This is where a web app can be a helpful tool. There are several ways to display results using the correct wording. For more specific outputs, you can add data dictionaries or category-encoders objects to your Pickle files. You can save Pickle files easily and export the results to tables using these functions. The explanationability summary can be configured to suit your needs and focus only on the most important things for local explicability.

Ability to easily deploy in a production environment and to complete every prediction/recommendation with a local explicability summary for each operational apps (Batch or API) Shapash is open to several ways of proceeding: It can be used to easily access to results or to work on, better wording.

Displaying results requires very few arguments. The more work you do to clean and document the data, the better the final results for end users. Shapash can be used for Regression, Binary Classification, or Multiclass problems. It is compatible with several models such as Catboost and Xgboost. Shapash uses local contributions that are calculated using Shap, Lime or other techniques which allow computing summable local contributors. Install the package using pip: $pip shapash Shapash Demonstration. Let’s apply Shapash to a specific dataset.

We will demonstrate how Shapash explores models in the remainder of this article. To fit a regressionor…and predict house prices, we will be using the Kaggle “House Prices” dataset. Let’s start by loading the Dataset: import pandas as pdfrom shapash.data.data_loader import data_loadinghouse_df, house_dict = data_loading(‘house_prices’)y_df=house_df[‘SalePrice’].to_frameX_df=house_df[house_df.columns.difference([‘SalePrice’])] house_df.head(3) Encode the categorical features: from category_encoders import OrdinalEncodercategorical_features = [col for col in X_df.columns if X_df[col].dtype == ‘object’]encoder = OrdinalEncoder(cols=categorical_features).fit(X_df)X_df=encoder.transform(X_df) Train, test split and model fitting. from sklearn.model_selection import train_test_splitfrom sklearn.ensemble import RandomForestRegressorXtrain, Xtest, ytrain, ytest = train_test_split(X_df, y_df, train_size=0. 75)reg = RandomForestRegressor(n_estimators=200, min_samples_leaf=2).fit(Xtrain,ytrain) And predict test data… y_pred = pd.DataFrame(reg.predict(Xtest), columns[‘pred’], index=Xtest.index) Let’s discover and use Shapash SmartExplainer. Step 1 — Import from shapash.explainer.smart_explainer import SmartExplainer

Step 2 — Initialise a SmartExplainer Object xpl = SmartExplainer(features_dict=house_dict) # Optional parameter features_dict: dict that specifies the meaning of each column name of the x pd.DataFrame. Step 3 — Compile xpl.compile( x=Xtest, model=regressor, preprocessing=encoder,# Optional: use inverse_transform method y_pred=y_pred # Optional) The compile method permits to use of another optional parameter: postprocess. You can use new functions for better text (regex and mapping dict), …).

We can now display the results and see how the regression model operates! Step 4 — Launching the Web App App = xpl.run_app This web app link will appear in Jupyter output. (View the demo here). This Web App has four components.

Each part interacts with the other to make it easy to view and explore the model. Important Features: To update your contribution plot, click on the feature you are interested in. Commentation plot: What does it do to influence prediction? Show violins or scatterplots of every local contribution. Local Plot: The local explanation of which features are most important to the value predicted.

You can use several buttons/sliders/lists to configure the summary of this local explainability. Below we will discuss the filters you have the option to use for your summary. This web application is useful for discussing with business analysts how to best summarize operational requirements.

The Selection Table allows users to choose: A subset that will focus their exploration. A single row which displays the local explanation. How can you use the Data table in order to select a subset. Just below the column name, type: >Value.

THE FOREFRONT OF TECHNOLOGY

We monitors and writes about new technologies in areas such as technology, innovation, digitization, space, Earth, IT and AI.

Related Posts

Leave a Reply