Shopping recommendations in PMML.

In previous posts (PMML revisited and Predictions in Kogito) we had a glance at how a PMML engine has been implemented inside Drools/Kogito ecosystem.
This time we will start looking at a concrete example of a recommendation engine based on top of PMML.
The first part of this post will deal with the ML aspect of it, and the second part will talk of the actual usage of the generated model.
So, let’s start…

Use case description

The ABC Inc. wants to increase its sales. In the past, they already tried to suggest the product they would like to sell but without a good result.
Recently, they heard about AI so they want to use a more advanced approach where the suggested product is not decided upfront but it is defined based on the behaviour of the users.
The company sells three kinds of items: Books, PCs, and Cars, but usually the customers buy one type of item, seldom the others. So, the company wants to recommend items of the preferred type and not ones that have already been bought.

Data preparation and model creation

The code used to generate data and the PMML model is available at TrustyAI Sandbox repository. 

For simplicity, there are ten items for each type, and they are named Book-0, Book-1, Book-2 etc. The databuilder.py script is used to randomly generate a sample of 1000 customers that preferentially buy one kind of item, and possibly some other items of different types.

A 30-dimensional array represents the purchased items, where 1 means a bought item. The first 10 elements are the Books, the next 10 elements are the Cars, and the last 10 elements are the PCs.

BookCarPC
0, 1,…,910, 11,…, 1920, 21, … 29

Generated data are stored inside input_data.csv. 

We have chosen to use a KMeans cluster model to represent the distribution of customers in the three main buyer groups (Books, Cars, PCs).

To define, train and test the model, the Binder environment has been used. It provides a ML environment bound to a GitHub repository. A quick and clear tutorial of its usage with Python may be found here. 

For the model training, part of the generated data has been extracted. 

This notebook shows both data and graphical output:

clusterpredictionbuyer group
5191Car
8372Book
2080PC
5251Car
9780PC

The last step is the actual dump of the generated model to PMML format, done with the nyoka/skl_to_pmml library. 

Here’s the generated cluster_buyer_predictor.pmm.

TRUSTY-PMML primer

Trusty-PMML offers an easy-to-use API to evaluate models against given input.

First of all, a reference to a model-specific PMMLRuntime has to be created. Available methods are defined inside org.kie.pmml.api.PMMLRuntimeFactory interface.

Some of them are meant to be used inside a KieServer container, with a pre-generated Kjar that contains the model already compiled to java classes.

 Others simply require a working reference to the PMML file, in which case the model will be compiled in-memory the first time it is executed. The following snippet shows an example of the latter:

ClassLoader classloader = Thread.currentThread().getContextClassLoader();
URL pmmlUrl = classloader.getResource(PMML_FILENAME);
File pmmlFile = FileUtils.getFile(pmmlUrl.getFile());
PMMLRuntime pmmlRuntime = new PMMLRuntimeFactoryImpl().getPMMLRuntimeFromFile(pmmlFile);

The next step requires the creation of a PMMLContext containing the input data to use for evaluation.

The following snippet shows an example of that:

static PMMLContext getPMMLContext(String modelName, Map<String, Object> inputData) {
        String correlationId = "CORRELATION_ID";
        PMMLRequestDataBuilder pmmlRequestDataBuilder = new PMMLRequestDataBuilder(correlationId, modelName);
        for (Map.Entry<String, Object> entry : parameters.entrySet()) {
            Object pValue = entry.getValue();
            Class class1 = pValue.getClass();
            pmmlRequestDataBuilder.addParameter(entry.getKey(), pValue, class1);
        }
        PMMLRequestData pmmlRequestData = pmmlRequestDataBuilder.build();
        return new PMMLContextImpl(pmmlRequestData);
 }

The last step it is to actually retrieve the evaluation of the model inside a PMML4Result:

PMML4Result result = pmmlRuntime.evaluate(modelName, pmmlContext);

PMML4Result contains the status of the execution (resultCode), the  name of the target field (resultObjectName), and all the variables evaluated by the model.

The following snippet shows how to retrieve the target results:

String targetField = pmml4Result.getResultObjectName();
Object result = pmml4Result.getResultVariables().get(targetField);

The recommender engine

The companion project uses the “cluster_buyer_predictor.pmml” with the Trusty-PMML engine to predict the “buyer-group” of a randomly created customer.

The pmml engine is responsible for making such predictions.  Based on that, the rest of the code will identify which items have not been already purchased by the customer, and will suggest one of them. 

The Customer object is a simple DTO that is initialized with randomly selected preferred type and bought items, stored as List.

public class Customer {

    private final List<String> buyedItems;

    public Customer() {
        ItemType itemType = ItemType.byClusterId(new Random().nextInt(3));
        buyedItems = mainBuyedItems(itemType);
        buyedItems.addAll(casualBuys(itemType));
    }

The loop in the entry point creates five customers and retrieves the recommendation for each of them.

public static void main(String[] args) {
        IntStream.range(0, 5).forEach(i -> {
            Customer customer = new Customer();
            logger.info("Customer {}", customer);
            String recommendation = getRecommendation(customer);
            logger.info("We recommend: {}", recommendation);
        });
    }

Reccomender class is the core of the application.

It invokes Converter method to translate the List of bought items to a 30-dimensional array of integers (0 and 1):

int[] buyedItems = Converter.getBuyedItemsIndexes(customer);

Then, it calls PMMLUtils method to retrieve the cluster the customer belongs to:

int clusterId = PMMLUtils.getClusterId(buyedItems);

Last, based on the cluster id and the already bought items, a recommendation is generated:

private static String getRecommendation(ItemType itemType, List<String> buyedItems) {
        logger.info("getRecommendation {} {}", itemType, buyedItems);
        List<String> alreadyBuyed = buyedItems
                .stream()
                .filter(buyed -> buyed.startsWith(itemType.getClusterName()))
                .collect(Collectors.toList());
        if (alreadyBuyed.size() == 10) {
            return null;
        }
        return IntStream.range(0, 10)
                .mapToObj(i -> itemType.getClusterName() + "-" + i)
                .filter(itemName -> !alreadyBuyed.contains(itemName))
                .findFirst()
                .orElse(null);

    }

PMMLUtils, on the other hand, is where the PMML model is actually instantiated and evaluated, so let’s dive deeper into it.

To start with, a static block initialize the PMMLRuntime, reading the pmml file:

 static {
        ClassLoader classloader = Thread.currentThread().getContextClassLoader();
        final URL pmmlUrl = classloader.getResource(PMML_FILENAME);
        File pmmlFile = FileUtils.getFile(pmmlUrl.getFile());
        PMML_RUNTIME = new PMMLRuntimeFactoryImpl().getPMMLRuntimeFromFile(pmmlFile);
    }

The getClusterId method convert the int[] to a Map; for each element in the array, a new entry is created, with the index of the element as key, and the value of the element (cast to double) as value:

public static int getClusterId(int[] buyedItems) {
        logger.info("getClusterId {}", buyedItems);
        Map<String, Object> inputData = new HashMap<>();
        for (int i = 0; i < buyedItems.length ; i ++) {
            inputData.put(String.valueOf(i), (double) buyedItems[i]);
        }
        PMML4Result pmml4Result = evaluate(PMML_RUNTIME, inputData, MODEL_NAME);
        logger.info("pmml4Result {}", pmml4Result);
        String clusterIdName = (String) pmml4Result.getResultVariables().get(OUTPUT_FIELD);
        return Integer.parseInt(clusterIdName);
    }

The evaluate method creates a PMMLContext out of the provided Map and returns the result of the evaluation, as PMML4Result :

private static PMML4Result evaluate(final PMMLRuntime pmmlRuntime, final Map<String, Object> inputData, final String modelName) {
        logger.info("evaluate with PMMLRuntime {}", pmmlRuntime);
        final PMMLRequestData pmmlRequestData = getPMMLRequestData(modelName, inputData);
        final PMMLContext pmmlContext = new PMMLContextImpl(pmmlRequestData);
        return pmmlRuntime.evaluate(modelName, pmmlContext);
    }

Sum up

In this post we have tackled a real-world recommendation scenario using the PMML and Trusty-PMML engine.

First, we have created some sample data and trained a KMeans cluster model out of them.

Then, we have provided a brief explanation on the basic Trusty-PMML API.

Last, we have shown a bare-bone java project that, featuring the pmml engine, is able to provide reliable recommendations.

But this is just the start of the journey.

In the next posts we will see how to implement a cloud-native service that remotely provides the required predictions, and then… but let’s not spoil the surprise.

Stay tuned!

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments