In the previous post we saw how to perform RFM customer segmentation using the functionality of the PAL library through the AFM graphical environment to generate Flowgraphs.
In today’s post we are going to see how to perform the same segmentation, but using the Hana Machine Learning functionality with the Python API to interact with the PAL library functions in Hana, for this we need:
- An environment with Python installation, Anaconda with Jupyter Notebook or VS Code with anaconda or Python back-end.
- Installing the Python API for Hana (https://pypi.org/project/hana-ml/).
The idea remains the same, from a virtual data model (VDM) in Hana we will generate the Bins or segments for recency (R), frequency (F) and monetary (M) and as a final step we will save it in a system table to be used in a later analysis.
Let’s get started.
The SAP HANA Python API for machine learning functions (Python Client API for ML) provides a set of client-side Python functions for accessing and querying SAP HANA data, and a set of functions for developing models.
The Python API for ML has 2 parts:
- A set of APIs for different PAL algorithms
- The SAP HANA Data Framework, which provides a set of methods for analyzing data in SAP HANA without bringing that data to the customer.
This library uses the SAP HANA Python driver (hdbcli) to connect and access SAP HANA.
A dataframe represents a table (or any SQL statement). Most of the operations in are designed not to fetch data from the database unless explicitly requested so that the operations are performed on the database.
2.- Import the libraries
3 – Set up connection
To use a SAP HANA DataFrame, we create the “ConnectionContext” object and then use the PAL library methods to create a HANA DataFrame. use the PAL library methods to create a HANA DataFrame. This can only be used while the ConnectionContext is open and is not accessible once the connection is closed.
We use a Hana model (Virtual Data Model) to obtain the information we need to create the segments in our Notebook.
We create the content of the connection for the data frame, in this step we specify the SQL statement to obtain the data for our segmentation.
Perform some basic checks.
4 – RFM segmentation
In this section we define and generate the segments using the PAL functions.
Definition of segment strategy:
Generate the Recency bin.
We generate the Recency bin.
We generate the Monetary bin.
Merging the 3 dataframes into a single dataframe
Generamos una serie de gráficos para visualizar la distribución de los segmentos y los valores medios de cada uno de ellos.
5 – Save Dataframe in Hana
Guardamos el RFM Dataframe en una tabla en Hana.
6 – Hana Database
As a final step of our RFM segmentation, each customer is assigned a segment and we have them stored in the system and available for further analysis.
With this last step we have seen how to perform a RFM segmentation in Hana, in the next post I will show you how to analyze this information in combination with transactional information from our dataset using SAP Analytics Cloud (SAC).