Process mining attempts to assist with the work of identifying unknown processes by inspecting logs of executed processes. If each activity generates a log entry, and the log entries are grouped by instance of the process, it then becomes possible to derive a process map from the raw event logs. This is the origin of the name 'process mining' – effectively, process mining is data mining event logs in order to identify process maps in an automated way. Last week we covered initial techniques, so let’s get more advanced.
Where I've previously described the background of process mining, and then four example algorithms that exists for implementing process mining, I'm going to finish by outlining a couple of more advanced techniques that are possible, provide some pointers on where to get more information, and then last of all give my impressions on the subject.
The first advanced technique that supplements the standard techniques of process mining is known as 'conformance checking'. In keeping with the algorithmic nature of process mining, conformance checking calculates a score for how well a given petri net aligns with a given log – in this way you can obtain a score for how well the net generated by one of the algorithms matches the logs, which enables you to make better decisions about how closely to trust the process model that has been generated.
The essence of conformance checking is to run through each possible path for each process execution instance in a log and count the number of times the trace does not match the path; from this derives a replay fitness score for each path. Finally, choose the highest replay fitness score (the closest match between the process instance in the log and the path followed through the process map). Then aggregate across each process instance.
The second advanced technique is performance analysis. Performance analysis depends on the log traces recording the time that each event is recorded, with the assumption that the log entry occurs at the completion of the associated activity. Given that we have derived a process map from the log, when we apply the timestamps from the log as well, we can derive the time that each activity took. Now, this is interesting, because in some ways it addresses a key problem with simulation – the problem of estimating how long each activity will take, which is a necessary parameter to simulating a process.
At this point, it's possible that those reading this and the previous two posts may decide that they want to learn more about process mining, so in conclusion I'll outline some ways to do this. At this point, the community interested in the subject of process mining is still small, but resources do exist. There is a website, www.processmining.com and an open source tool called PROM. Last of all, one of the researchers in the area has created a free course on the subject of process mining, which they run regularly on the futurelearn website, and which gives an excellent introductory overview of the subject.
So, to conclude, what are my impressions of process mining? The concept itself is exciting. A classic problem in process discovery is getting SMEs to remember every aspect of the process, never more so when they are truly experts in the process, because so much has become instinctive and assumed. So process mining can help with this. The biggest obstacle is in having the log files. Of course, there are some products that do already produce logs... but the two challenges are recording the correct events and ensuring that they are readable by the process mining engine. Nevertheless, there are organizations that have successfully used process mining... so it's an area that could well have potential.