Do not log

blackbrokkoli · on March 14, 2020

Weird article.

As greatly represented in this comment section, this article touches on maybe 1/10 of the usecases of logging...and than pretends that the opinion is applicable everywhere.

> And yes, you got it correctly: it sounds like a job for Sentry, not logging.

okay, I guess I will not write a log file for my Raspi bash script then and instead setup a third party service?

> We don’t really care about exceptions and how to handle them. We care about the business value that we provide with our app.

That's how you get shitty software which can be fixed relatively quickly regarding large scale black swan events (everything broken) and retains their annoying edge case bugs forever, slowly driving users away...

All the while, the arguments against logging seem to be the same against implementing, well anything. More things = more complexity. New things = new problems. By all means be aware of these pitfalls in the context of logging and anything else.

But as a whole, this article awfully smells like an unreasonably strong opinion wrapped in pseudo-intellectualism.

ekidd · on March 14, 2020

> this article touches on maybe 1/10 of the usecases of logging...and than pretends that the opinion is applicable everywhere.

It's definitely a weird article.

I maintain an open source tool called dbcrossbar that automates data transfer between several kinds of databases. It uses multiple async streams to copy data, and it has a "backpressure" system that keeps every step of the transfer in sync. Usually it's very reliable, but if it goes wrong, then I need debugging tools.

We use the "slog" library for structured logging. This allows us to add extensive structured data to our logs, and to keep track of what's coming from where. We can increase or decrease the log level for each component individually.

And these logs are invaluable for debugging. Like, almost every time we have any problem at all, we all sit around saying, "Wow, we're so glad we have such good logging."

hinkley · on March 14, 2020

> We don’t really care about exceptions and how to handle them. We care about the business value that we provide with our app.

This is famously not true. When the system is working, “we”, as in the business, only care about the business value. The moment things break or get slow it’s all recriminations about “how could you let this happen?”

Then later it’s right back to, “why can’t you go faster?” until it happens again.

masklinn · on March 14, 2020

>> And yes, you got it correctly: it sounds like a job for Sentry, not logging.

> okay, I guess I will not write a log file for my Raspi bash script then and instead setup a third party service?

They're not even exclusive. You can and should feed your logs into sentry. It can use low-sev logs as breadcrumbs and the high-sev logs as events: https://docs.sentry.io/platforms/python/logging/

Having extensive logging and sentry makes your sentry reports significantly better, it's not an either/or situation.

eqvinox · on March 14, 2020

I don't think the author has ever debugged a complex semantic/logic state bug, maybe even across multiple components or systems. Relatedly, all they look at is error logging.

Admittedly, there's a blurry line here between logging and tracing, but the latter word generally evokes externally attaching additional tools. But that "far end" of tracing is frequently not what's needed... there's a reason logging systems have "info" and "debug" severities.

Either way good (i.e. configurable, low overhead) logging is worth its weight in gold.

hinkley · on March 14, 2020

When I noticed that some of the people who complained loudly about certain practices also tended to disappear when a problem was too difficult for them to solve, I stopped letting what they think bother me.

Bosses notice exceptional events. As the team or cluster get bigger, they start happening closer together. Depending on context, any event that happens once a week or once a day will be characterized as happening “all the time” and that creates a new set of problems.

You need tools to head these off. And yes they are a pain but so is telling your boss, “we don’t know.”

notduncansmith · on March 14, 2020

> You need tools to head these off. And yes they are a pain but so is telling your boss, “we don’t know.”

That is one of the feelings I dislike the most as someone who takes pride in building quality software. It’s important to be honest when you don’t know why something is wrong, but even more important to minimize the number of times it happens. Logging is incredibly beneficial for the latter, and complements metrics + exception-tracking.

hodgesrm · on March 14, 2020

> I don't think the author has ever debugged a complex semantic/logic state bug, maybe even across multiple components or systems.

That includes rare conditions where you have to backtrack from error to root cause based on information already in the log. I worked for years on data replication and got into the habit of logging anything that might affect data integrity at INFO level to ensure it would be available to guide debugging.

Simple examples: where did you read config values from on startup? Log the file name including the full path. Rebooting? Log all steps in the boot sequence in detail. Retries of calls to remote components? Log that too--put in a counter to log every N failures so you don't flood the log. Transaction aborts? Log them as well using the same counter method. Extra credit: Use ring buffers to dump current transaction state on fatal errors.

If things are OK the log is fairly small.

This kind of logging is essential in transaction processing systems. You can't log every transaction because (at least in my case) there were billions of them. But you should log everything that gives insight into whether the system is meeting preconditions for safety and liveness. If you need more info you can turn on traces but this will at least give you an idea where [not] to look.

xvector · on March 14, 2020

I don't even want to count how many nightmares distributed tracing has saved me from.

j88439h84 · on March 14, 2020

> there's a reason logging systems have "info"

What is it? Performance?

timeattack · on March 14, 2020

> Are you joining my side?

No.

As a user of many programs I always find it very frustrating when some program does not work and produce no log information whatsoever.

As an engineer, I always write detailed and human-readable log entries with as much context as possible which is split into trace, debug, warning and info levels.

With trace level enabled it's typically possible to reconstruct program execution step-to-step, which is _invaluable_ when you need to debug some rare problem.

Even if logs are ephemeral and not persisted it's better to have logs for a run than not having logs at all.

Logging as a side effect is irrelevant if you look at your code from the pragmatic standpoint, from which program execution transparency matters more than some abstract elegancy.

pletnes · on March 14, 2020

Interesting. How do you enable «trace-level» logging without writing «log.trace(...)» absolutely everywhere in your code? And what about performance?

For context, I mostly program in python. I could imagine compile-time loglevel settings could address the performance, though.

timeattack · on March 14, 2020

I code mostly in Go. And yes, I'm writing `log.Tracef` everywhere.

Might seem tedious, but in fact it's easily automated using snippets in the editor.

Also when I'm inserting trace statements during development it provides a great help since it's much easier to debug logic.

Regarding performance: I don't think that we need to worry about log performance. Most applications we have this days are not performance-bound to the logging. And all it takes to disable particular log level is single `if` in the code. Or a build flag in more extreme cases.

stickfigure · on March 14, 2020

At least in Javaland, the "should I log" condition test for most logging libraries takes nanoseconds. Unless you're in an extremely tight loop, performance is fine.

benschulz · on March 14, 2020

If it was absolutely necessary you could even have the JIT remove the code and then de-optimize when you enable tracing[1]. That would solve the tight-loop issue..

[1]: https://community.oracle.com/blogs/forax/2011/12/17/jsr-292-...

pkolaczk · on March 14, 2020

Correct, in simple cases it can do it, but you need to be very careful with computing the arguments for the logging call. Java has no lazy evaluation of args nor macros, so if you accidentally concat strings there or do something expensive, it is likely JVM won't optimize these out. That's why typically you want to guard your logging calls on the hot path with an if condition.

lmm · on March 14, 2020

I think the point here is to not do unstructured, string-based logging. If logs are valuable, then you should treat them the same way as any other valuable data: rather than blatting string soup into some datastore and hoping you can somehow index it in the future, write a proper data schema and represent the trace of your program execution in a structured, machine-readable way.

This ends up overlapping a lot with event sourcing: if you're doing event sourcing right, with every substantial business decision recorded as an event and your logic decomposed into lots of fine-grained transformations between event streams, then the event streams can be your logs: each successive event transformation either succeeds or fails, and if it fails then you have the input event and the small piece of code in which the failure occurred, so you shouldn't need any more. If a given transformation gets too complicated to debug, split it into two smaller transformations!

You still need alerting on failures, but again, that's something that's better done in a structured way, with proper stacktraces and business-relevant information.

devchix · on March 14, 2020

I could never get structured logging to work, syslog receiver for structured logs is a nightmare to set up. Also, when I start writing logs, I just want to quickly set up something to check my error state and move on. The onus of setting up structured logging output in the code is too much work in the immediate moment. Additionally, structured logging does not solve the problem we think it solves, what is the log trying to say. What do I use as my key-value naming convention? fault_type = major, component_name = X, subcomponent_name = X.aaa desc1 = "some text" desc2 = "fooblat" -- I just made all those things up, the log consumer is still going to need to parse out the KV pairs and decipher what that means and what to do. Meanwhile, we have RFC5424 which does a fair job of defining and standardizing those made-up things.

lmm · on March 14, 2020

I meant store logs the same way as any other data - be that an SQL database, Cassandra, or whatever you're favouring for this system. I'd expect your fields to be somewhat business-specific, and reading them works the same as reading any other business data - you write the data schema in one place and generate read and write code from it. key-value pairs aren't perfect and you certainly can (and should) define a more detailed schema than that, but even just a bag of key-value pairs is a lot more structure (and therefore a lot easier to query) than a raw string.

jujodi · on March 14, 2020

Maybe, sure seems like the point was "do not log ever"

lmm · on March 14, 2020

Well yeah, because when we're storing structured data we don't think of that as "logging". But the point isn't to throw that information away, it's to store it properly.

mrkeen · on March 14, 2020

* Use [...] error and business monitorings with alerts

* make logging an explicit part of your contracts

* Logging should be done right

jmull · on March 14, 2020

This article is pretty confused. For example, the concluding section is titled "What to do instead?" but doesn't actually address that question. (If this were the only problem, I'd think it was an accidental leftover from an edit, but it's all only partially coherent.)

There are some good parts. It's not all wrong or anything. But I don't think it's worth the read.

Maybe a good takeaway is: consider which monitoring systems might accomplish your business purpose better.

Also, I guess: a full-featured logging subsystem in a distributed system can be substantial. True but not too interesting because that's pretty obvious, often not needed, wouldn't normally be implemented from scratch any more than a monitoring system might be, etc.

nsfyn55 · on March 14, 2020

>This article is pretty confused.

Articles like this are often written by someone that manages a trivial or non-critical path system. They have strong feelings about logging because they lack other things to have strong feelings about.

This person seems confused in general. They say "don't log, send it to sentry". Sentry is logging. Sentry is usually configured as a logging destination. It just does some preprocessing on those logs to aggregate and enrich them.

I don't know anyone that fishes through text logs over ssh anymore. We use simple automations to roll everything up for convenient access.

Furthermore he overlooks other use cases for logging. Including analytics, fixing concurrency/heisenbugs, and a myriad of other problems that logging addresses.

jmull · on March 14, 2020

> I don't know anyone that fishes through text logs over ssh anymore.

Good points, though it's funny: In my day job I only ever fish through logs (not by ssh -- I don't have that much access to the systems).

But that's only because by the time it gets to me multiple levels of people (who really know what they are doing, BTW, so it doesn't happen often) have done everything else.

devchix · on March 14, 2020

No! Terrible advice. Good luck trying to figure out why component X has stopped working without logs of any kind, not even a timestamp of when it stopped working. Logging does not make any sense? That's just polemical. His examples are unrealistic and can be taken apart quickly. Can I make bad states unreachable? No, I can never know all paths to failure, and perhaps I don't care. I can't complete a write, I don't care if storage is full or I don't have perm but I want to know that my write did not complete.

Further along he goes on to advocate for logging systems. And "business monitoring". What does he think drives these "business monitoring" thingy? Quick notification -- what's that? Like, an output of some kind, denoting a state? I think we call that logging.

Log everything, make sure it goes somewhere you can look at. Logging but not collecting is as good as writing to /dev/null.

cecilpl2 · on March 14, 2020

I'm in game dev, and logs are one of the most, if not the most useful debugging tools we have. It's not uncommon to see rare bugs that we can't reproduce yet have to fix, and the only thing we have to go on is a log and a video. It's usual to add speculative logging to help figure out what the problem is the next time we happen to see the bug.

It's also very helpful to identify performance issues when you have a flamegraph with interspersed log lines on a timeline.

perlgeek · on March 14, 2020

> There are a lot of cases when we don’t care. Because of the retries, queues, or it might be an optional step that can be skipped. If this failure is not important - then just forget about it.

No! Log it and still retry.

There are lots of reasons why might want to know later on if something failed the first time and then had to retry, for example when debugging possible timing issues.

> And yes, you got it correctly: it sounds like a job for Sentry, not logging.

Sentry is a form of logging. Also, sometimes you have potentially private data in local variables that you absolutely cannot risk sending to a service over the network; a log message allows you to be very careful with what information you reveal and what you don't reveal.

Which brings me to another use case for logging: sometimes you simply cannot report exact errors to customers (because they might leak the existence or nature of other customer's data), but if the customer contacts support, you absolutely want that information in the logs (even if just to decide if the vague error message was intentional or a bug).

Don't get me wrong, I'd love to live in a world where every action straight out succeeds or results in a bug ticket, but the real world is much more complicated, and logs do help making sense of that.

Spivak · on March 14, 2020

> No! Log it and still retry.

You will likely say this is a form of logging, and you’re right, but I think this is a better fit for a metric rather than a log.

“Something something retried at…” isn’t nearly as useful as the “the number of retries peaked at…”

perlgeek · on March 14, 2020

Both logging and metrics can be useful in such a case.

Metrics are useful to get a general picture, and logging is useful when you try to find out in one specific instance what happened.

If you have a billion requests per day, you likely don't want logging for retries; if you have a small number of requests, many of them important enough to investigate individual errors, you absolutely want logs on what happened to each request.

hinkley · on March 14, 2020

I’m having trouble defining the line between logging and statistics, and things like retries are firmly in the middle here.

Record retries: yes. Log them? Mmm, maybe not.

alkonaut · on March 14, 2020

> are you joining my side?

No I’m probably going to do what he had in the introduction and just dump major events or errors to a text file because it’s cheap, easy, and it’s usually all that’s needed to diagnose end user issues (I do desktop apps and can’t do any real time monitoring but on the other hand logs are perfect chronological records of the actions of one user on one document - not of just one system as is common for web apps).

It’s not that dumb logs are great, but they provide 50% of what you need at almost no cost.

If I were able to dream and could have the amount of thought and design that he describes in any part of my code - logging would be pretty far down the list of subsystems I’d refactor.

Rainymood · on March 14, 2020

It is my first time working on production grade code which has SLAs with other parties and well, for me, logging is crucial.

1. Logging the whole end-to-end chain allows us to view "whose fault" it is basically. We work in a microservices landscape and any request coming in the system goes through several teams. Pinpointing the team responsible for fixing their shit is important to get things up again.

2. Sometimes it happens that we (my team) drops the ball, we have some custom logging that traces the whole request through our application and we can quickly pinpoint what function failed or again whether it is a third party API call that failed or something.

For me, logging is very important, but perhaps there are better solutions available that I'm unaware of, always eager to learn and improve.

BoorishBears · on March 14, 2020

“Do not log specific things, just pay for a service that logs everything”

The false dichotomy between monitoring and logging here is so bizarre, they’re not mutually exclusive. When an alert wakes you up on a weekend, having a log statement with useful context is not going to kill you...

Also:

> Isn’t it a bit too hard?! Look, I just want to write strings into stdout. And then store it somewhere.

So do that? Why is the author pretending you need to use ELK stack?

And what about their example is “too hard”, if you’re deploying your own application that’s the least of your complications, and if you aren’t and already paying a premium for it, there’s plenty of managed ELK and ELK-like products (and I’d also be surprised if your managed deploy platform doesn’t already support you just logging to STDOUT)

londons_explore · on March 14, 2020

This article doesn't look at the usecases for logging.

In my mind, there are audit logs, to know what happened during an incident. You don't know what the incident will be, so you should log as much maybe-useful stuff as possible, and you only look at the logs if some incident requires attention.

There also logs to debug an issue that occurs in production but can't be diagnosed any other way. When a customer says "I clicked the log in button, but just saw a blank screen", and you have no other records, the logs will be handy.

praptak · on March 14, 2020

I agree maybe on one point. Most of the logging messages are useless.

Part of this is the old adage "Half of my X is wasted effort but I don't know which half" but there is still a class of log messages which should not exist.

What I'm talking about is when I'm too lazy to think how to handle a condition. What I sometimes do is half assed effort to handle and, "obviously", log it. This gives the false feeling that the condition is handled.

In reality now I have two problems: a poorly handled condition and possibly log spam.

That said, logging in general is great. I found a tricky resource leak by investigating logging rate anomalies.

benmmurphy · on March 14, 2020

I was introduced to the real benefits of logging on an Erlang project. They used a pseudo-csv log format and it was possible to do a lot with these logs. For example you could process them with cut/uniq/grep/etc in order to produce statistics for stuff that you weren't recording statistics for. If there was some kind of dataloss because of a bug it was sometimes possible to reconstruct the corrupted/missing data from the logs. If you had a problem with a particular transaction you could just grep for the transaction_id and see how it moved through the system to narrow down what happened to it.

I'm now working on an Elixir project but the way Elixir does logging is kind of dangerous. So you have a dynamically typed language that relies on unit testing in order ensure stuff works, then you have logging methods where the compiler will remove them depending on what log-level you have configured for your environment. Now run different log-levels for production and testing. What could go wrong? It seems insane that the framework would let you do this and not have a more sensible default like run the log methods but drop the output to the floor and you have to do a bit more work in order to make the log methods be completely removed.

cr3ative · on March 14, 2020

Add more logging.

I don't want to have to guess why your service is crashing. I'd much rather be able to jump in to the logs and find the exact exception information so I can fix it quickly.

cl0rkster · on March 14, 2020

>If so, let’s refactor our code to be type-safe. And eliminate all possible exceptions that can happen here with mypy.

All possible exceptions eliminated when calling doSomethingComplex(). Perhaps I'm missing something here and this overly broad language is more targeted than i've realized while reading... But that just sounds silly to my ears. And saying that making something type safe solves "all possible errors"... What am I missing?

jaimex2 · on March 14, 2020

Better titles to the article:

How to take the information out of information technology.

or

How to have no clue about why an edge case crashed your systems.

wickedOne · on March 14, 2020

this is ridiculous advice.

sure, use third party services / applications to ease insights and additional steps like notifications, but not logging at all does not make any sense.

"And eliminate all possible exceptions that can happen here" that's including runtime exceptions?

"There are a lot of cases when we don’t care. Because of the retries, queues, or it might be an optional step that can be skipped" using different log levels perfectly indicates the importance of the exception.

"it sounds like a job for Sentry, not logging" afaik sentry is logging which makes me wonder whether the author is advocating a principle or a product...

oftenwrong · on March 14, 2020

It's odd to me that software developers so often develop these sorts of questionable dogmas. Also, I wonder how many of my beliefs about software fall into that category.

I recently worked with a person who was anti-logging. His argument was that logging is only necessary when the code path is not easy to assume solely based on the input. This is true, of course, but his mistake was assuming it is possible to guarantee that level of predictability in non-trivial operations. It wasn't long before he was adding logging to his code to figure out an issue that occurred rarely and only in production.

p2t2p · on March 14, 2020

I wonder if that person ever tried to debug an obscure support case for a product that is installed on some customer on-prem hardware with ridiculous settings.

olsgaarddk · on March 14, 2020

Kinda off-topic, since the article is about not making errors, instead of logging them, but sometimes you are not in control of input, and you expect some exceptions that you want to log.

In Python, instead of the example in the blog:

    try:
        do_something_complex(*args, **kwargs)
    except MyException as ex:
        logger.log(ex)

Use the almost undocumented `exc_info=True`, like this

    try:
        do_something_complex(*args, **kwargs)
    except MyException as ex:
        logger.error(ex, exc_info=True)

This will log the entire exception.

Even cooler, every un-caught exception calls `sys.excepthook()`. The function is supposed to be monkey patched by anyone who wants to do something with uncaught exceptions, so you can do the following if you want to log all uncaught exceptions:

    def exception_logger_hook(exc_type, exc_value, exc_traceback):
        logger.error("Uncaught exception", exc_info=(exc_type, exc_value, exc_traceback))
        sys.__excepthook__(exc_type, exc_value, exc_traceback)
    
    sys.excephook = exception_logger_hook

This will send all uncaught exceptions to the logger and then continue with raising the exception normally.

teddyh · on March 14, 2020

> Use the almost undocumented `exc_info=True`

Why not simply use logger.exception()?

  try:
      do_thing()
  except Exception:
      logger.exception("It broke.")

Called from within an exception handler, logger.exception() automatically includes exception information.

olsgaarddk · on March 14, 2020

Mostly because I didn't know of it. But also if you want to log caught exceptions as debug, because you were expecting the exception. E.g., a StopIteration exception you for some reason might want information about, or you have an object you need to check if it has a certain key, but for some reason doesn't have a get-method.

I'm sure there are also good reasons :)

sten · on March 15, 2020

TIL, thank you sir. Gonna try this out tomorrow.

Trias11 · on March 14, 2020

Wonder why log analyzing companies worth billions? And the ones who done it right 10x above amount?

Because real world is hard to change and everyone and their cat will continue to do logging in their own bad ways and won’t change that and logging is important for everyone.

DO LOG.

Deal with it.

robofanatic · on March 14, 2020

Imagine the do_something_complex function is executed as an automated nightly job. No humans involved. And let's say it needs to access a remote API or a database. One fine day some network changes happen in the external system without your knowledge and your function simply can't connect to the external API. Next morning how are you gonna know why your job failed without logging the exception?

nkozyra · on March 14, 2020

> I either want a quick notification about some critical error with everything inside, or I want nothing: a peaceful morning with tea and youtube videos. There’s nothing in between for logging.

Until you do. Then, instead of having access to it, you need to hope you can recreate it.

Meanwhile you can filter your logging such that the critical stuff still comes to you and the other stuff is sent into the ether and eventually GC'd

nicodjimenez · on March 14, 2020

The opposite is true in 2020: log everything. We now have incredible open source tooling for time series data, like ElasticSearch and InfluxDB. Disk space is almost free. There is very little downside to essentially tracking EVERYTHING!

Log everything but focus on logs and metrics that matter most to customers (eg reliability and percentile latencies).

Fiahil · on March 14, 2020

I run a functional programming workshop at work, and we talk a lot about side effects.

I'm fighting unseeded 'random.next', and 'file.read' in algorithms, so they can remain pure. But I won't ever forbid someone to use logs in a pure function. Sure, it doesn't make sense from a functional programming point of view, but let's be practical and realistic for five minutes : logs are the only thing you can count on to debug a crashing script quickly. The occasional "bug" thrown by your logging back end is not a big enough deal to forgo our hability to trace what's going on in a running program. Everything is running inside kubernetes anyway.

supermatt · on March 14, 2020

Do not log, instead log?

jujodi · on March 14, 2020

Think this mostly boils down to the opinion that you should handle errors properly or ignore them instead of just logging them. This article might read better if you defined what "logging" means up front and make sure it's a well scoped definition. I can also think of a bunch of cases where logging is super important that aren't really covered by any of your arguments.

For example, if you have a micro services architecture in production in which many services might be accessed (or not) from a request of a particular origin, then you're going to want to "log" a unique identifier for that request and propagate it across all the systems so that you can tie them back to each other if needed.

Another example, you sell hardware to consumers. The hardware is composed of modules and components that all talk to each other (yep, it's a car), and how long those components and modules last is effected by environment variables (how you drive). You've got a couple of million of these things on the road. If you want to learn how to improve quality of your hardware, you need to "log" how your hardware performs in the environment that you don't control, so you can correlate failure with environment variables. Can't do that without "logging".

martin_a · on March 14, 2020

Just flew over the article, but it made me wonder: When I'm developing scripts for our workflow systems I'm often also "logging" successful events. It helps with looking how a process works but in the end I never remove that logging output.

I wonder how you people handle that. Do you also log "good events", for example when nothing happened just to be able to see everything went well?

benmmurphy · on March 14, 2020

If you are not using something to aggregate statistics using something like prometheus or graphite then logging good events can be really useful in order to produce statistics by doing text processing on the logs. It can also be a good first step towards introducing something like prometheus or graphite. Once people can see the kind of insights into an app you can get from dumb logging and manual text processing it becomes easy to sell some kind of metrics recorder to the team.

yowlingcat · on March 14, 2020

It's hard to do logging well, but some logs are better than no logs for the purposes of visibility. When something goes wrong and you need to audit the event trail, you reach for logs. If you don't have visibility when things go wrong, the first thing you do is add them so you get visibility.

I was originally inclined to say this article was written by an amateur at best and at worst, an edgy contrarian with questionable intentions. It looks like the rather is somewhat the truth. The author is the author of the `returns` [1] python library, which aims to "make your functions return something meaningful, typed and safe" -- is this the end result? No logging?

Are we to live in some kind of statically typed utopia where users never do unexpected things, the domain model never changes on a dime, algorithms never have edge cases and software is all formally provable, easy to reason about, purely functional and stateless? I can almost assure you that management at any customer facing firm will disagree on whether logging is necessary or not, for reasons that are obvious for anyone who has ever talked to a customer in their life.

Not to wax philosophical and get overly personal for a second, but why does FP have this effect on some of its practitioners? Is what happens when you spend too much time in the cathedral of academia that you forget how eldritch monstrosities can be built and maintained in any style of code, with any sophistication of typing system or automated theorem prover or formal verification? That the problems often come in at the requirements level, and that much of the job of the engineer is honing, refining and disambiguating the requirements deeper than the business otherwise can? There is no escape at the code level, only at the organization level.

[1] https://github.com/dry-python/returns

sleepydog · on March 14, 2020

I work in tech support, and this advice directly makes my job harder. Making error states impossible is good, but you can't always do this. Please continue to log.

For problems that are difficult or impossible to reproduce, logs are all I have. Tracing and metrics and structured error reporting only get set up in places where errors are expected.

inimino · on March 14, 2020

Clickbait title. The conclusion is:

> I am fighting an “overlogging” culture. When logs are used just for no good reason. Because developers just do it without analyzing costs and tradeoffs.

The article would have been better if it started with the conclusion, and could have done with a good bit of editing, but outrageous claims do stimulate discussion.

markbnj · on March 14, 2020

Why does the author think Sentry isn't logging, and can't have side effects? Very strange article. Starts with principles I agree with: don't log unimportant events and when you do log send everything you need to understand the problem... and then goes well off the rails.

JdeBP · on March 15, 2020

One thing not addressed in other comments: Please do not write logs to standard output, as implied by this article. Write them to standard error. In C++, one even has an explicit std::clog stream.

Of the service management frameworks that handle service logging for you, some do not by default collect what is written to standard output, but they will all collect what is written to standard error. This is also the Unix convention of long-standing for diagnostic messages, going back to the very reason that standard error was invented in the first place.

* http://jdebp.uk./FGA/daemontools-family.html#Logging

csours · on March 14, 2020

I agree with a lot of the points in this but not the conclusion.

One thing I would like to emphasize: You don't write software for software's sake. You write it to solve a problem, generally a business problem. If your software has a problem, that may also be a business problem. Don't make IT support solve business problems. Make the business solve them, and give the business tools to solve them.

Obviously you won't get there with the first release of your software, but it is the goal. If you find a problem in the logs, ask yourself: Am I masking a problem that the business needs to solve themselves? Does the software need to surface this?

euske · on March 14, 2020

I do think that logging is indeed a serious business and should be the subject of more serious study, including the formatting and timing, what to write, how to analyze, etc. Sadly, the article haven't even scratched its surface.

kthielen · on March 14, 2020

Another voice in favor of logging. At Morgan Stanley we wind up recording about 1TB/day of structured log data, which also needs an efficient query engine to process. We carved a basic type system out for structured data, maps to C++ data well, and built a Haskell-like PL (and JIT compiler built on LLVM) to process it:

https://github.com/Morgan-Stanley/hobbes/

The logging part is just a single self-contained header-only lib, so very easy to incorporate in existing systems.

andreareina · on March 14, 2020

Related, The log/event processing pipeline you can't have: https://apenwarr.ca/log/20190216

unnouinceput · on March 14, 2020

I kinda agree and disagree with article's author. You see, do log, but do so in a compiler switch so that your production code has no logging in final form and is easy to generate.

Get 2 variants, one with logging and debugging and whatnot, which is used for alpha, beta, gamma / whatever version and the clean one that you release officially.

And yes, I'm aware that sometime a bug will creep only in the release version while in the debug version will not happen. What can I say, those are where your experience really is showing, so you can understand the root of the problem.

lazyant · on March 14, 2020

Good points but "I either want a quick notification about some critical error with everything inside, or I want nothing", so this is hoping a) you will never need as much information as you can about a hard issue you are trying to debug/RC and b) you will never have to go through any audit/retro touching this path.

Storage is cheap, if there's no valid side effect (false alerts, performance hit, slow log searches) I want verbose logs, they can be rotated/archived later thanks.

PaywallBuster · on March 14, 2020

Its weird you're only looking into logging exceptions.

At previous companies we setup logging to record all kinds of actions from end users, including possible errors/exceptions.

But as you figured, metrics are the end goal.

I've previously setup mtail to create metrics from log messages and that gives me 90% of what I may need from logs at a fraction of the cost, speed and time.

But logs are still useful for exploratory ad-hoc analysis, or the less common patterns which you're not monitoring or creating metrics for.

_5vzs · on March 14, 2020

Out of similar frustration, I created Cabin @ https://cabinjs.com.

Cabin is a BYOL (bring your own logger) solution that normalizes logs by adding metadata and automatically strips/masks/purges passwords, credit cards, and sensitive fields from logs automatically. It works out of the box with any client-side or server-side JS environment, and has support for Koa and Express.

scott_meyer · on March 14, 2020

So, imagine how things are going to work out when you have to explain to a VP that you have absolutely no idea why the site was down for 4 hours, but you have now added logging such that when it happens again you might be able to debug the problem...

Log as if your life depended on being able to debug a single occurrence. If this causes a lot of output, you probably need to add more assertions to the code.

romanovcode · on March 14, 2020

> it sounds like a job for Sentry, not logging.

Sentry IS logging.

ashwoods · on March 14, 2020

Even sentry itself says sentry is NOT logging: https://sentry.io/vs/logging/

xolubi · on March 14, 2020

Because it is logging with extra steps.

See: https://news.ycombinator.com/item?id=22574712

ashwoods · on March 14, 2020

There is a lot of overlap between logging, monitoring and crash/error reporting. In Sentry with some systems it does hook up to your logging framework to gather extra context through breadcrumbs or adds a hook to send logged error reports (in addition to the error reports from exceptions and crashes). They also follow logging good practices and integrate with them, but if you need logging, you can't use sentry for that. It only keeps the relevant information to a specific crash, and it aggregates data from crashes of a specific type so you won't even get historical data from crashes in some instances. Quoting from one of their blogs:

>>> Sentry is built on top of best practices like logging. However, this can cause confusion and signal-to-noise issues if you don’t understand the the difference between logging and error reporting.

Throughout your application you are likely logging many kinds of events. While these are commonly errors, you are probably logging a lot of debug information as well. This is the fundamental reason why logging has levels (e.g., debug, warning, error). When we abstract our crash reporting through logs, the challenge is to differentiate between actual errors and debug noise. >>> https://blog.sentry.io/2016/02/09/what-is-crash-reporting

kevindong · on March 14, 2020

I'm not sure which is worse (or if they're equal): too many logs or too few logs.

Too many logs makes trying to find the relevant logs when debugging near-impossible. Too few makes trying to find the root cause near-impossible. They're both terrible. But finding the happy middle ground is also near-impossible.

magwa101 · on March 14, 2020

Asserts during development and testing are the best. Instant cadaver for inspection. Initial testing is very slow, crashes all over the place, stopped applications, etc. But then quality sky rockets as obvious things are fixed. Then eventually non obvious things get found. Logs hide problems in test phase.

hernantz · on March 14, 2020

I was able to debug many race conditions in production thanks to logging. None of the steps that happened concurrently resulted in an error. Logging is helps you audit how your app is being used. And that is a different thing from monitoring or error reporting.

djohnston · on March 14, 2020

I'm not sure I buy it. I had never seen monads in Python though, that's pretty cool.

tjholowaychuk · on March 14, 2020

I'm good, I like to actually have data when a customer reports an issue... or you know, figure out why a system fails haha.

Also, Sentry is logging, it just groups by a fingerprint.

fastball · on March 14, 2020

Seems like the author is just a big fan of handling problems with strong, static typing ala Rust. Good for him.

That's not a reason to get rid of all logging though.

Also, Sentry is logging, so...?

dpc_pw · on March 14, 2020

Bad take. The advise on how to do better logging at the end are OK, but the arguments before that are quite weak.

7532yahoogmail · on March 14, 2020

Some of the comments here are too wrong in the other direction. I'll come back to that. What I would have liked to see from the begining was something like:

If you must log then pipe the output into <????> which will store your log into a search system such that if you have a coordination id will bring logs from the various microservices together.

So what is ????

Log rotation, compression, disk space, getting FS space and making directories on large corporate networks is a pain. I also have to ssh around and compose logs. We also deal with servers in multiple timezones and most programmers are too dumb to put the TZ. Figuring out what TZ was ultimately used and sequencing it is stupid work. Yuck.

At my office the gdpr of logging is not well adopted. Logs are left in code really for debugging, to monitor a PR change in prod then forgotten later. You can see where that goes: waste

Also for a serious system with wide customer usage, the log should contain a errno like error (defined universe of errors with a set format), the use case, plus the details. Not whatever sprintf the programmer dreamed up.

The OP is also right here: logging is used as a bailout just in case thing because the programmer did not make a narrow contract in the method, service. Logging is used to find edge cases. That's not right.

Finally logs can be slow and some esp. c++ streams stuff may use mutexes.

brown9-2 · on March 14, 2020

“Do not log” and “Logging should be done right” in the same article, not sure which advice to take.

zug_zug · on March 14, 2020

Nonsense/linkbait