Prototypes and Live Queries: A Sneak Peek Into The Future of Drools (featuring Debezium and Apache Calcite)

Drools is a hybrid rule engine, allowing both data-driven forward chaining (rules match facts in the working memory producing other facts that in turn activate other rules) and goal-driven backward chaining (queries match facts in the working memory, eventually invoking other queries to retrieve them). This second usage pattern is also available in streaming mode through live queries which allow attaching a listener for change events instead of returning an iterable result set.

This feature allows to create incremental materialized views of the facts inserted in the working memory and then it makes Drools a good fit to analyze and aggregate live streams of data. For instance this could be applied to the stream of changes generated by a change data capture tool like Debezium

Moreover it would be nice to have the possibility of querying Drools not only in DRL, but also using a more standard and well known query language like SQL. This is possible by using a tool to parse and analyze SQL query like Calcite and translating the results of this analysis into a Drools query.

I put these ideas together in a proof of concept to demonstrate how this could work. In particular in this test case I’m simulating the output stream produced by Debezium in json format by incrementally inserting into my query engine messages like the following

{
   "before":null,
   "after":{
      "id":1001,
      "first_name":"Sally",
      "last_name":"Thomas",
      "email":"sally.thomas@acme.com"
   },
   "source":{
      "version":"1.8.0.Alpha2",
      "connector":"postgresql",
     "name":"dbserver1",
      "snapshot":"true",
      "db":"postgres",
      "schema":"inventory",
      "table":"customers",
   },
   "op":"r",
   "ts_ms":1643708392757,
   "transaction":null
}

What Debezium does is setting the before field of this json to null and populate the after if a new record has been inserted, doing the opposite if it has been deleted and populating both fields if it has been updated, so I’m inserting, removing or updating facts into the Drools session following the same logic

Note that the facts inserted in the engine cannot be plain pojos since there doesn’t exist any Java class modeling them. For this reason they are modeled with structural typing as Prototypes, a feature recently added into Drools. Here the types of the different facts ingested by the rule engine is no longer determined by their Java classes, but simply by a logical name that in this case coincides with the name of the table of the record to be processed.

As anticipated my other goal was having the possibility to query this system with a plain SQL query like the following

SELECT * FROM customers c LEFT JOIN addresses a on c.id = a.customer_id

To achieve this, in a first iteration, I used Calcite to simply parse the SQL query and visit the resulting AST to programmatically translate it into a Drools query defined through the executable model DSL. Then I realized that Calcite can do much more than simply parsing the SQL query: it also analyzes the query, producing the corresponding relational algebra. This analysis provides a normalized and optimized view of the SQL query, so I wrote a second implementation of my transformer from SQL to Drools queries that is feeded by the relational algebra produced by Calcite. 

Incrementally feeding the engine with the sample data generated by Debezium that I used in my test and defining a listener that simply outputs the incremental matches to the transformed SQL query found by Drools, produces an output like the following

rowInserted:
   customers: {last_name=Thomas, id=1001, first_name=Sally, email=sally.thomas@acme.com};
   addresses: {city=Hamburg, street=42 Main Street, id=100001, customer_id=1001}
--- 0
rowInserted:

   customers: {last_name=Thomas, id=1001, first_name=Sally, email=sally.thomas@acme.com};
   addresses: {city=Berlin, street=11 Post Dr., id=100002, customer_id=1001}
--- 1
rowInserted:

   customers: {last_name=Bailey, id=1002, first_name=George, email=gbailey@foobar.com};
   addresses: {city=Los Angeles, street=12 Rodeo Dr., id=100003, customer_id=1002}
--- 2
rowInserted:

   customers: {last_name=Bailey, id=1002, first_name=George, email=gbailey@foobar.com};
   addresses: {city=Monterey, street=1 Debezium Plaza, id=100004, customer_id=1002}
--- 3
rowInserted:

   customers: {last_name=Bailey, id=1002, first_name=George, email=gbailey@foobar.com};
   addresses: {city=Monterey, street=2 Debezium Plaza, id=100005, customer_id=1002}
rowUpdated:

   customers: {last_name=Thomas, id=1001, first_name=Sarah, email=sally.thomas@acme.com};
   addresses: {city=Berlin, street=11 Post Dr., id=100002, customer_id=1001}
rowUpdated:

   customers: {last_name=Thomas, id=1001, first_name=Sarah, email=sally.thomas@acme.com};
   addresses: {city=Hamburg, street=42 Main Street, id=100001, customer_id=1001}
--- 4
rowDeleted:

   customers: {last_name=Bailey, id=1002, first_name=George, email=gbailey@foobar.com};
   addresses: {city=Los Angeles, street=12 Rodeo Dr., id=100003, customer_id=1002}
rowDeleted:

   customers: {last_name=Bailey, id=1002, first_name=George, email=gbailey@foobar.com};
   addresses: {city=Monterey, street=2 Debezium Plaza, id=100005, customer_id=1002}
rowDeleted:

   customers: {last_name=Bailey, id=1002, first_name=George, email=gbailey@foobar.com};
   addresses: {city=Monterey, street=1 Debezium Plaza, id=100004, customer_id=1002}
--- 5

In a similar way the test provides also a second use case, this time using a group by, demonstrating how this implementation also works with it.

In my opinion this proof of concept demonstrates the flexibility of Drools and its ability to efficiently process and aggregate data. The recent introduction of prototypes extends these capabilities even in cases when these data are structured but untyped as discussed in this article.

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments