Core Values of Logging
Logging has been an integral part of software development for a long time. However, logging is a rather general term and overused at times. On a practical level, all of us know and use so-called logging frameworks such as log4j and log4net. What they are designed for is software tracing, that is, making an execution flow persistent. Software tracing records low-level technical events and is aimed at highly technical users. As a cross-cutting concern it is designed to help find and cure defects in a program or its runtime environment.Most logging libraries allow their users to categorize the severity of the event that is being logged by specifying their log level, such as DEBUG, INFO or ERROR. Users can then specify the minimum severity level of events that should be persisted at runtime. When events are logged from a very low level – such as DEBUG – upwards, log files can grow large very quickly.
Challenge #1: Reproducing a Bug based on Gigabytes of Log Data
As mentioned above, detailed log files can carry huge amounts of data about the runtime flow of a program. When a bug is reported by a user, log files are often the sole source of information to diagnose it. But when there are gigabytes of historical log data to analyze, this can become a challenge. Sifting through such a huge amount of textual data manually is tedious. When log files from different subsystems have to be correlated, the task becomes even more time-consuming. Apart from that, issues reported by non-technical end users generally do not include sufficient detail to reproduce an issue. In such cases, developers have to map high-level events expressed in the language of the user's business to technical events. These are some of the main reasons why diagnosing bugs using log files is one of the most challenging tasks for software developers.Challenge #2: Extracting Business Metrics from Log Entries
Of course, more than just diagnostic information can be collected using log files. Logging is also commonly used for recording business metrics. As an example, consider the owner of a SaaS product who is interested in improving her signup workflow. To that end, she wants to determine the proportion of signups that are aborted by users to signups that actually go through to completion. Of course, it is easy to find completed signups from logs. It is more challenging, however, to quantify aborted signup attempts, since those can have various technical reasons or can simply be abandoned by users.Extracting all signups that have not gone through from log data means that the developer has to imagine all log entries that imply an abort. Since logging, in the sense of stored execution traces, is first and foremost done to help with the creation and maintenance of software, it is likely that the log files are incomplete in regard to business metrics. This results in inaccurate business metrics which moreover are collected through a very time-consuming and as a consequence costly manual process.
Start Monitoring High-Level Events
Recently, technologies complementary to logging that can help to perform tasks and reach goals such as the ones described above have become available. One of those technologies is high-level event monitoring. Its core idea is to raise the abstraction of logged data from low-level technical events to business events. In contrast to normal log messages, which typically consist of little but a timestamped message, event monitoring allows for including structured data in event messages. An event may consist of a category, a subject, a body, tags and properties. This built-in structure makes processing, filtering and analysis of business information easier and faster, for people as well as business intelligence software.Lower Granularity, More Context Data
High-level event monitoring also differs from software tracing with regard to granularity. High-level events occur at a much lower rate than log messages since they represent business events rather than mere technical details – for instance, a delivery being fulfilled rather than a file not being found.
Ideally, a stream of business events is understandable to people unfamiliar with the implementation details of a system. A key aspect in this regard is that a business event includes context information, like an order number or a user ID. This context information comes in handy, when for example a customer reports an issue with an order: the support staff can reproduce what has happened by simply filtering events by the order number of this particular order.
Ideally, a stream of business events is understandable to people unfamiliar with the implementation details of a system. A key aspect in this regard is that a business event includes context information, like an order number or a user ID. This context information comes in handy, when for example a customer reports an issue with an order: the support staff can reproduce what has happened by simply filtering events by the order number of this particular order.
Challenges in a World of Distributed Systems
Another core concept of high-level monitoring is owed to the fact that more and more software systems are distributed. As a result the demand for the aforementioned correlation of multiple logging sources is becoming increasingly important. This challenge is mastered by introducing a central server component which collects, processes and visualizes high-level events.At Xaidat, we are building Caduceus, a technology-agnostic and lightweight application monitoring platform. If you want to see high-level event monitoring in action, try Caduceus.