(content originally posted at my personal blog)
For those wondering why I finally decided to create my own blog, it all started in my last class of Non-Classic Logics at USP. The teacher was explaining about one of the temporal logics that were formalized, known as Logic US (that stands for Until/Since).
At that moment, something clicked and I remembered Mark commenting about supporting CEP/ESP as an extension to the Rete engine we have implemented in JBoss Rules. It seemed a perfect match: supporting temporal logic would give us all we need to also reason over stream of events. I wish it was that simple.
Yesterday I spent the day looking for information about possible research that might have been done and what is the state of the art in the field. I confess, besides having a reasonable knowledge of Rete, I have no experience in ESP/CSP.
As you can imagine, there is a huge amount of information about it in the internet and I couldn’t find/read it all, but so far, the following is what I was able to gather (trying a top-down approach).
Event Driven Architecture
First thing I tried to understand is the terminology, the use cases and state of the art in the ESP/CEP field, and I ended up reading this excellent post by Brenda Michelson, where she provides an overview of the terminology and how they relate to each other. I will not repeat here all what she says, but I found specially interesting the Event Driven Architecture she describes there. From it, I was able to define the boundaries of my current research.
What I’m interested is in how we can leverage the features of a rules engine, more specifically a Rete implementation, to handle CEP and ESP. In other words, from the reference architecture she mentions, I’m interested in the Event Processing Engine component.
ESP and CEP
ESP and CEP have quite a lot of information published. From David Luckham’s excellent site, to university research projects, to products developed to do it, like Esper and StreamCruncher. A very prolific field.
I found a recent post (April, 2007) in a blog by Ashwin Jayaprakash where he addresses the issue. In his words:
The tenets of Event Processing
A facility to consume the state(s) of a System as discrete Events, a language or User Interface to specify the conditions/properties to look for in those Events from one System or across Systems (Co-relation) and provide actions to be performed when such situations arise.
It is no accident that Event Processing looks like a cross between Database Query processing and Rule Engines. There is no denying that the technology evolved from existing technologies. However the distinguishing characteristics of Event Processing technology are – the constructs it provides to specify Time based boundaries, fixed size moving Windows, Aggregation and Partitioning/Grouping Events – on a stream of incoming Events and a facility to continuously scan these Events for telltale signs, all this expressed as a concise Query.
Database Triggers which in turn execute SQL/Stored Procedures every time an Event/Row is inserted can provide rudimentary Event Processing functionality. Rule Engines can be used to discern information from Facts being asserted. But none of these technologies alone provide a concise and yet very expressive Domain Specific Language along with a platform, specifically built to constantly monitor/process live streams of Events.
I specially agree with Ashwin’s statement that a Rules Engine as we know them, alone, can not provide a concise and yet expressive DSL to handle ESP and CEP. Although, I prefer to think (I guess in a similar fashion to what James Taylor says) that we can leverage a Rete Engine to address such requirements. We all know that Rete Engines excel at pattern matching tasks, so basically what is missing is the capability to handle streams of events where just some of them are really “interesting” for the processing (a “select”-kind feature) and even those selected events get old quickly and must be discarded (sliding window and temporal logic support) in a way to scale the solution. The expressiveness of the language I guess is a consequence of the addition of the mentioned features.
This may be an over simplification of the subject, but that is how I see it so far.
The few current approaches I was able to read about since yesterday are extensions to the relational engines to handle temporal constraints and extensions to the SQL language to allow for querying such engines (I didn’t read documentation about proprietary solutions like Coral8 and StreamBase yet).
- StreamCruncher for instance implements and engine that is backed by a relational database and leverages it to handle ESP/CEP.
- Esper on the other hand, seems to implement its own relational engine, but the language is also an extension of SQL.
While both approaches seems good to me, it itches me even more to understand how a Rules Engine approach would do in comparison with the existing ones.
Floyd Marinescu posted last year an article to InfoQ talking about the Esper release where he says that “The rules engines proved cumbersome and slow for this use case“. He also quotes Alexandre Vasseur:
[Esper] is not to be confused with classical rules engines…The internals of Esper are made up of fairly complex algorithms primarily relying on state machines and delta networks in which only changes to data are communicated across object boundaries when required.
On the proprietary software side, I found a post about TIBCO, stating it uses “a modern compiled variant of the Rete algorithm for class-leading performance“. It only mentions CEP though and I’m not sure it is used as a synonym for ESP or not.
In this comment, Daniel speaks about the CEP capabilities of ILog JRules, but not about ESP.
Given all the mentioned above, I guess that in order to believe, I need to see it myself.
Event Stream Processing Requirements
From my readings, I was able to draft a list of requirements, that in my opinion, a Rete engine would need to comply to be able to handle ESP and CEP tasks. They would be:
- Support for events as a special type of facts: while facts are long living pieces of information, events have special temporal requirements and the engine must be able to work with both at the same time, inside the same working memory.
- Support asynchronous multi-thread feeding: events may arrive at any time and from multiple sources (or channels), so the engine must support asynchronous, multi-thread feeding of events.
- Support event selection: only a subset of the events going through the event stream are interesting for a given set of rules. The engine must be able to identify (select) such events and ignore the others.
- Support events garbage collection: events grow old, quickly or slow, but they do grow old. The engine must be able to identify the events that are no longer needed and dispose them as a way of scaling. It is simply impossible to hold events indefinitely. Ideally, the identification and disposal would be automatically, using the results of rules compile time analytics.
- Support temporal relative constraints: one way of defining the window of interest in a stream of events is do define patterns of events that start and close each window. Such windows would be defined through the use of temporal constraints with operators like “since” and “until”. One example of a rule using such constraints would be: “Since the stock price went under US$ X, watch for any buyers of more than 1% of total stocks, until the day is over“. The above rule will start to watch for any buyers when the stock price drops under a threshold, but once it happens, will continue watching them until the end of the day, even if the stock price goes over the threshold again.
- Support temporal absolute constraints: the rules engine must implement a pseudo-clock, that may or may not (depending on the use case) be synchronized with the real clock and allow for constraints to be defined over the sliding time window relative to the pseudo-clock. Events outside the time window are old and should be discarded.
- Support reasoning over absence of events: the engine must support the definition of rules that will fire over the absence of events in the expected time-frame (relative or absolute). Example: “If the temperature goes over the threshold and no contention measure is taken within 10 seconds, then sound the alarm“.
- Support to common mathematical accumulations of the events in the streams, like average, max, min, etc.
The whole event matching and correlation requirements are already provided by the rules engine. The main question though is:
Is it possible to implement all the above functional requirements as an extension to a Rete engine and still comply to the non-functional requirements, specially performance?
That is what I will try to understand over the next few weeks. Stay tunned and please, I do appreciate feedback, suggestions and critics.