Logging

There’s no software without bugs. And if it is, it is still dependent on environment where it run which can cause unexpected behaviour.

Fast and proper identification of software misbehaviour needs appropriate information what happened. The most powerful, most used and also a simplest technique to report what is software doing is logging. Moreover, logs will be the main source of information (if not the only one) when something wrong happen. In those times you will be grateful for each bit of a meaningful information which can help you to get your application back on the road and prevent more damages or at least solve the problem before SLA time expires.

Usually content of log is not covered by analytic documents. It’s up to programmer which writes the code to feel the need to write available information out. This sense for the right place and content usually comes with experience from real, production operation. It is a good habit to take a look what time was spent by solving issues and analyse reasons if average time is too high. Common problem is unclear or even missing information from application run leading to solution.

Logging is often underestimated. Especially in the early stages of development, its importance is not fully appreciated. It is even more of a surprise when application is deployed in a production environment. At that time it is often due to the scope of work difficult to check and modify the content of logged records.

The way how application stores and presents logs has no impact on a content of written data. Whether you write the logs to disk or send them directly to ELK or other system, what matters is what was output from the application. However, a user-friendly presentation can make navigating through large amounts of messages much faster.

Correct and clear logging is one of the most important tools which can improve quality markers.

Think like a…

Programmer think like a programmer	Definitely! It can be you or your colleague who will solve issues in production in a year or two. Even now you know and understand the code you write, after longer time, writing other thousands of lines of code, memories become a blur. There’s a rule of month: After one months the own code is like someone else’s. So don’t be selfish and think like a programmer from the future. When you write code ask yourself simple questions: Will I be able to find out what happened in the app? Can I find the data that drove the processing flow? Is it clear from the log what the written data means? If your answer is ‘no’, try to re-think what to write to log. Ask these questions also when doing code review of your colleague’s merge request.
Programmer think like a system administrator	Don’t write too much. Hardware resources have their limits. Instead of listing the values of variables in separate records, write them in one record. Use correct level of severity. System administrator usually doesn’t know your application inside-out and cannot assess severity. Proper levels are also required for automated alerting. Especially in levels with higher severity write self explaining, human readable messages. Even system administrators are technical people, they do not have to know internals of your application. And they are also humans which make mistakes. E.g., if application finds out that some configuration is wrong, it’s nice to say which config file to check or which value to fix.
Programmer think like a security specialist	Don’t write sensitive information. Check the logs twice if there’s no sensitive information from your code or from a 3-rd party libraries you use. Keep security in mind while checking merge requests of your colleagues.

Give feedback

Programmer	Did you have troubles to understand what application do? Did you find a meaningless message, unclear formulation or a value without a name? Fix it or at least create a task to fix it later. Talk about logging with colleagues. Use a code review or retrospective meetings. It is an important part of code even it’s not the reason why you write your application.
System administrator	Application doesn’t report its failures? Does application log too much or use wrong severity levels? Report your findings to development team. Good developer appreciates every feedback. Don’t let your finds sink in the dust. Doing nothing makes problems bigger.
User	Depends what kind of a user you are, probably you will never see logs and you don’t have many ways to give feedback. But anyway, of you can report long times of issue investigation. You are a user if you integrate a 3-rd party library. A little bit of a programmer and a little bit of a system administrator looking at logs without knowing a code inside. Don’t hesitate to report wrong behaviour to library creator or better – create a merge request with suggested fix in case of an open-source software.
Manager	In most cases you don’t care about logs. But log is a tool which has a huge impact on effectivity of a team when it comes to new installations or issue investigation. Try to talk about logging on a retrospective meetings. Do retrospective meetings oriented on an issue solving effectivity before go live. It’s hard to fix it later.

Be aware of common mistakes

Whenever you write logs to files or collect them to some log management system, there are some simple rules worth to follow:

Use correct severity level.
Log only data important for current activity or drive process flow.
Write messages which are clear without the need to dig into the code.
Messages in higher severity levels are intended for system administrators solving issues – use more descriptive language and add some hint if something gone wrong.

Let’s make a review of some buggy code.

Severity levels

Most of logger libraries uses the same severity levels as defined by Apache Log4J. Even if your library uses different naming, meaning is usually the same.

TRACE	Very detailed information, intended only for development. Because of its verbosity this level should be off on pre-production and production environments. It could be turned on only in special cases and just for a short period of time. The distinction between DEBUG and TRACE is the most difficult, but if you put logging statement and remove it after the feature has been developed and tested, it should probably be on TRACE level.
DEBUG	Detailed information on the flow through the system. Debug should contain as much information to be able to solve later bugs. Expect these to be written to logs only
INFO	Important technical information. Administrator or advanced user should be able to understand INFO messages and quickly find out what the application is doing. E.g., application started, shutdown process started, some settings changed… Expect these to be written to logs only.
WARN	Something went wrong but the process might be continued just take extra caution. The application can tolerate warning messages, but they should always be justified and examined. E.g., “Application running in development mode” or “Administration console is not secured with a password”, use of deprecated APIs, poor use of API, near errors, other runtime situations that are undesirable or unexpected, but not necessarily “wrong”. Expect these to be immediately visible on a status console. Message should be clear for technical people. Non-programmers should be able to recognize if such message requires action or not.
ERROR	Severe errors that cause premature termination and other runtime errors or unexpected conditions. Something terribly wrong had happened, that must be investigated immediately. No system can tolerate items logged on this level. E.g., unavailable database or a mission critical use case cannot be continued. Message should be clear for technical people not necessarily programmers. In case you know what should be done to fix this error, add hint or link to documentation providing more information. There’s a common set of configuration errors where such hint will be very appreciated. Messages in error level are often watched by monitoring system and used as a trigger for automatic incident reporting. Don’t be afraid to use it but use it clever.

Common mistakes

Let’s dive into example below.

    public Document loadDocumentWrong(File file) throws IOException {
        logger.debug("loadDocument {}", file);    // 1.

        if (file == null || !file.exists() || !file.canRead()) {
            logger.error("Unable to read file {}", file);    // 2.
            throw new IOException("File does not exists");
        }

        logger.debug("Start reading");     // 3.

        try (FileInputStream fin = new FileInputStream(file)) {
            Document document = parseDocument(fin);

            logger.debug("Parsed document {}", document);  // 4.

            logger.debug("Parsed document no. {}", getDocumentIdentifiers(document));  // 5.

            String version = getVersion(document);
            logger.debug("Detected {}", version);    // 6. 
        } catch (IOException ex) {
            // 7.
            doSomethingElse();
        }

        return null;
    }

There are few common mistakes:

Severity level is too high. Logging enter/leave method can be helpful only for development what makes it an ideal candidate for trace level.
Severity level too high. Consider error level before throwing exception – calling code can/should be aware of such behaviour and can proceed in alternate way according to expected behaviour.
Lack of information will make this message useless. Write out variables important at this place of code – in this case file.
Don’t write objects which can be huge. It’s usually a useless junk that no one read and it has also a performance impact.
Do not use processing inside log statements. It can cause unexpected behaviour and hard to investigate errors. But if you need to calculate some values especially for log, wrap these operations and variables into isXXXLevelEnabled condition block.
Identify information you write. Example message doesn’t say what was detected. To make it clear require programmer to take a look into a code. It’s a waste of time.
Missing log. Write a debug message if something went wrong and flow of code differ from common way.

Now take a look to fixed example.

    public Document loadDocumentCorrected(File file) throws IOException {
        logger.trace("loadDocument {}", file);    // 1.

        if (file == null || !file.exists() || !file.canRead()) {
            logger.debug("Unable to read file {}", file);    // 2.
            throw new IOException("File does not exists");
        }

        logger.debug("Start reading {}", file);     // 3.

        try (FileInputStream fin = new FileInputStream(file)) {
            Document document = parseDocument(fin);

            if (logger.isDebugEnabled()) {     // 5.
                List<String> identifiers = getDocumentIdentifiers(document);
                logger.debug("Parsed document no. {}", identifiers); // 5.
            }

            String version = getVersion(document);
            logger.debug("Detected version {}", version);     // 6.
        } catch (IOException ex) {
            logger.debug("Something went wrong but we can deal with it. Exception just for a case {}", ex);   // 7.
            doSomethingElse();
        }

        return null;
    }

Security

Logs reveal three basic types of information:

internal application structure and logic,
details about network and external applications with which application communicates,
user data.

Even logs are not publicly available, information written in message shouldn’t contain sensitive data. Never write passwords to log. If there’s need to write data which can be misused for authentication or authorization, use data obfuscation (e.g., replace part of a data with asterixis). For example: as a common practise is to log access token if some kind of OAuth is used. Such tokens are quite long. Attacker needs it full but for identification in log only a few characters are sufficient.

Sensitive data are quite often encapsulated in written objects. Such leaks are hard to find during code review. To keep risk low, stick with two rules:

Do not include or at least obfuscate sensitive data in toString() methods.
Log complex objects only in lowest severity levels.

Restrict access to logs. It sounds simple and if you use ELK stack or similar system it is already easy to setup. If you write logs into files, restrict directory access to an appropriate group. According to your way of cooperation in team consider whether it is not a good idea to set up a protected directory for sharing logs between programmers and/or system administrators. It is better to give people a managed tool than let them use their imagination. It usually ends up with sending huge logs by mail or maybe worth – sharing on internet drives.

Think like a…

Give feedback

Be aware of common mistakes

Severity levels

Common mistakes

Security

Further reading