An Introduction to Process Mining


The activity of accurately defining business processes is an ongoing problem for large organizations, and there's a variety of standards and techniques that have evolved in order to help organizations confront this task. Recently I've come across a rather interesting and relatively recent technique for this called ‘process mining’, so I'll be devoting my next three articles to examining this idea.

The basic idea of process mining is to extract a process flow from a set of event logs. If every activity in the process provides a log entry, it becomes possible to identify patterns in the sequence of how they occur to infer a process flow. In many ways, it's the opposite of process simulation. In process simulation you take a defined process flow and then 'execute' the process flow numerous times with randomly generated values and look at the resulting statistics in order to gain greater insight into the process design. Contrastingly, in process mining you look at the actual values for a series of instances of the process in order to identify the underlying structure of the process flow. Effectively, it's a case of data mining an event log to generate a process map.

The idea of process mining comes out of work performed at the University of the Netherlands, and the majority of the research in the area still comes from the original core team. At the same time, the idea of process mining has gained enough traction for the IEEE to establish a working group on the subject of process mining. This working group has established a standard XML-based format for event logs called XES (Extensible Event Support), although it's not necessary for event logs to be in this format – conversion from CSV, for example, is possible.

In terms of tool support, there are one or two commercial tools that offer some support for it, but there is also an open-source plug-in based tool called 'ProM', including a 'ProM-Lite' option that is loaded with a standard set of plugins – with the intention that a neophyte to process mining can experiment without first needing to make sense of which plugins to install.

Now, like everything at the enterprise level, it's not a panacea that solves all of the problems associated with process discovery. While a powerful concept, process mining does not eliminate the position of process analyst – it's not a question of turning the handle and distributing the resulting process map to any and all interested parties. For one thing, the process map that is extracted is only a good as the combination of the algorithm and the data fed into that algorithm. It does also require that activities generate event log entries. Last of all, event logs may show processes that have been incorrectly executed, with steps skipped or unnecessary steps performed.

Nevertheless, it does offer an interesting new approach to drawing order out of the chaos. In the next post, I'll be describing some of the techniques that basic process mining uses to extract processes from event logs.