Server logging – a view from down below

rainforest bird's viewIf you ever meditated upon the ancient “sound of the tree falling in the jungle” koan, you will surely feel an empathy with the intended readers of this post. There is a colorful and lively rainforest being uncaringly grown daily on most of the servers out there in the never sleeping on-line world.

Round-the-clock running servers write out a lot of potentially interesting data and each day produce a humongous harvest of logs. Some are needed for better understanding of User’s habits in general, but the ones this article focuses on are those cryptic lines sprouting (hopefully) some codes, (mostly) wrongly spelled description and anything else that developer thought could be useful to put out.

The Matrix

Another day a colleague catching a glimpse of my terminal screen made a witty comment referring to the Matrix. Come to think of it, it sometimes does resemble the matrix but (IMHO) in fact it’s actually the other way around.
A lot of (in)sane data thrown amidst timestamps, abbreviations and weird jargon. They might even look unorderly to an untrained eye. This is where our lucky engineer comes in and restores order.

The log

What do you think makes a good log? First, we should differentiate a log from a trace (in simplified form):
log: information about a specific subject its author wanted to store.
trace: information about the path leading to the event the author wanted to point out.

As usual, the common conception of a log/trace is somewhere in between of those two theoretically distinct idioms. As software developers we use logs mainly to track down strange artefacts appearing on unreachable (read: production) machines. The reason is self-explanatory. No manager want’s his team of developers being to close to the product Consumer. That would be too uncomfortable for both sides :).

To help the developer in his struggle, a lot of logging frameworks are already in very stable releases, and one just needs to pick his flavour. By incorporating the chosen framework, the developer has already set sail to a shinier world accompanied by his logger.

The one thing in common to many loggers out there is an easy way to control the amount and type of data being sent towards storage system. In its simplest form this control is done via log levels. This mechanism should help the production ecosystem’s storage and (depending on the platform, programming language and logger type) performance load. But, at the same time this presents the first choice a developer must take when defining a log. Let’s see what other important questions should a developer have in mind when writing a log:

  1. The motive? To record an activity, to debug or to warn?
  2. How much information is necessary for putting the log reader in the context?

Those two questions are very intuitive, yet most often taken very lightly.

Here’s an example of a terribly written log:

LOGGER.error("Exception occurred");

The author completely forgot about his fellow developer which will read the logs maybe a year later and find dozens of similar outputs. Pulling (what’s left of) his hair out trying to find the root of the problem reported like this is the first give away of a code maintainer. Was it so hard to at least include the exception itself?
An overzealous “tracer” thinks he will help more than his lazy friend by writing:

// this will also output the stack trace
LOGGER.error("Exception occurred", exception);

Now the logs are filled with thousands of lines of (in large part useless) stack traces. That’s like giving exact details about the route to Miserable Island but forgetting to say why we would ever want to know about it in the first place.

Besides messing up the content, the log authors often can’t decide about the importance (i.e. level) of the log. They are torn appart between the administrator’s request to keep the log files short, the code maintainer’s desire to easily spot the problem and fix it, and an eagerness to get to the meat of the feature currently being implemented.

If we borrow the 80:20 rule we could say: spend 80% of the time defining and 20% writing the logs. This will make your system flourish in bug-fixing efficiency.

Having that in mind, here’s a common paradox arising when comparing logs from alpha and production environments. The former is always generous in the amount of storage available for creating a positive fix on the problem. But, on the other hand, most of the illusive bugs are only reproduced on production systems which are set on ERROR level. It’s quite natural to daydream: “If only I could get a few of the surrounding DEBUG level logs. Then I could see what my home looks in the daylight.”

Guess what? That’s possible.
Your logger can be persuaded to do just that. The solution: a simple fixed size queue based logger which doesn’t truncate the logs immediately after processing.

The new ‘smart’ logger could push a variable length of predecessor lower level log lines out to the open and thus bury the war hatchet between the amount and usefulness of the data (both meaningful only in their perspective contexts).

A thing to remember

A simple rule of thumb would be:

  • Always put yourself in the position of the reader of the log.
  • What is interesting and to whom?
  • What would be of help to get to the cause of the potential problem? Everything else is just a waste of byte

If you have that in mind, the world will be a better place for you as the log/code author, maintainer, server administrator and eventually an employee serving your boss and/or your clients.