Blog

enterprise-architecture

Process Mining - Initial Techniques

someone panning for gold

In my previous article, I described an interesting approach to business process analysis, called process mining. The topic is still not mainstream, but it does have the potential to be a valuable addition to the toolset of business analysts. Essentially, the core idea of process mining is data mining in order to derive process maps. More specifically, the technique of process mining is to take a log file that records a sequence of events, where each event type corresponds to an activity in the as-yet unknown process flow. These events are grouped by instances of the process, and then from this log file look to derive the process flow that generated these events.

An enticing idea, but the devil is in the details. So, how exactly can we derive a process map from a set of log entries? It turns out that there are a number of different methods for this – specifically, a number of different algorithms that the process miner can apply to extract a process map from the log file that we have. These algorithms are called miners in the jargon of process mining, but before we look at four of them, it's worth talking a little about the type of process map that three of them produce – a petri net.

A petri nets is a directed graph; in other words, a set of nodes connected by transitions between them. While they do have a graphical notation associated with them, they're not used for business process mapping because they look very abstract and they have limitations; for example, a petri net doesn't consider roles. However, petri nets have the advantage that they have an exact mathematical definition, opening the door for them to be the subjects of mathematical analysis – hence their prominence in process mining. The most important point for a business process modeler is that algorithms exist to automatically convert a petri net to Business Process Modeling Notation or BPMN, the de facto standard for business process maps.

The first algorithm, both in this post and in order of creation, is the alpha miner. This looks at the different types of event that the log contains and maps the relation between each type of event. For example, if the “Loan application created” event always comes right before the “Credit check performed” event, the algorithm decides that there is a follows relation. If it was sometimes the other way around, it would derive a parallel relation. A total of four relation types (directly follows, sequence, parallel and no relation) are built up and then the algorithm uses this to derive a petri net. However, this petri net is not guaranteed to be valid.

The second algorithm that exists is the heuristics miner. The heuristics miner takes the alpha miner and replaces the simple relations with frequency analysis in order to better reflect the process – however, this petri net is also not guaranteed to be valid.

The third algorithm is the inductive miner. This breaks the logs into groups – for example, if the “Loan application created” event is always the first one in a log, this forms a group. If three event types always form the third, fourth and fifth events in a log, this forms a group. The inductive miner then looks at the groupings it can identify, in order to find choices and parallel execution, and relates the groups to define sequences. It has the advantage that it does generate valid petri nets.

The fuzzy miner uses a correlation. Out of the four algorithms, this is the only one that does not output a petri net, but instead outputs a “process graph” - with the precise process graph output, depending on the settings for what level of correlation to accept.

There is a running theme through all of these algorithms – there is no guarantee that what they generate is the correct process, or in three cases that it is even a well-formed process. But that's not the aim of process mining; process mining does not aim to automate process analysis, merely to aid it.

In the final post, I'll talk a little about some of the more complex techniques that can be applied aside from extracting a process map from a log file, after which I'll talk about ways to look deeper into this topic and my overall impressions.