The most crucial thing that I've seen over the years is that most developers are simply afraid of bringing the application down on bugs.
They conflate error handling with writing code for bugs and this leads to proliferation of issues and second/third/etc degree issues where the code fails because it already encountered a BUG but the execution was left to continue.
What do I mean in practice? Practical example:
I program mostly in C and C++ and I often see code like this
if (some_pointer) { ... }
and the context of the code is such that some_pointer being a NULL pointer is in fact not allowed and is thus a BUG. The right thing to do would be to ABORT the process execution immediately but instead the programmer turned this it into a logical condition. (probably because they were taught to check their pointers).
This has the side effect that:
- The pre-condition that some_pointer may not be null is now lost. Reading the code it looks like this condition IS allowed.
- The code is allowed to continue after it has logically bugged out. Your 1+1 = 2 premise no longer holds. This will lead to second order bugs later on when the BUG let program to continue execution in buggy condition. False reporting will likely happen.
The better way to write this code is:
ASSERT(some_pointer);
Where ASSERT is a unconditional check that will always (regardless of your build config) abort your process gracefully and produce a) stack trace b) core dump file.
My advice is:
If your environment is such that you can immediately abort your process when you hit a BUG you do so. In the long run this will help with post-mortem diagnosis and fixing of bugs and will result in more robust and better quality code base.
I’m a big fan of assertion and rigorous preconditions but there are times when a failure of some invariant in a minor subsystem should not be allowed to crash the entire process, especially if the context makes it easy to return an error.
In our project (the language server for Go) we have gotten tremendous value from telemetry: return an error, but report home the 1-bit fact that the assertion has failed. Often that fact is enough to figure out why; other times it is necessary to refine the assertion into two or more (in a later release) to get another bit or two of information about the nature of the failure.
I worked with a 3rd party library that had this mentality. "A bug is a bug so the assert fails and thus the code is now in an unknown state thus The right thing to do would be to ABORT the process execution immediately". Oh my.
Just do "if (pointer)" and when that fails, error out from the smallest context possible that applies to that pointer, and nothing more than that. I.e. the real BEST thing to do is to abort the current connection. To skip the current file with an error. To fail writing that piece of memory. Whatever. But never abort (unless maybe in debug builds).
The end result of this library was that we had a WebRTC server handling 100s of simultaneous video calls, and then when a single new user tripped up during connection and went through a bogus code path, the library would decide "oh something is not as I expected so I'll abort, of course!" and the whole production server was brought down with it.
That kind of behavior does not help achieving high production quality and providing robust and reliable services.
We ended up removing the library's runtime assertions, which meant that connections that would bug the library code would just end up failing with an error somewhere else, that could be used to just discard the attempt and try again. All in all, numbers showed it was a huge positive in stability for the service.
If you're validating parameters that originate from your program (messages, user input, events, etc), ASSERT and ASSERT often. If you're handling parameters that originate from somewhere else (response from server, request from client, loading a file, etc) - you model every possible version of the data and handle all valid and invalid states.
Why? When you or your coworkers are adding code, the stricter you make your code, the fewer permutations you have to test, the fewer bugs you will have. But, you can't enforce an invariant on a data source that you don't control.
Yes of course the key here is to understand the difference between BUGS and logical (error) conditions.
If I write an image processing application failing to process an image .png when:
- user doesn't permission to the file
- file is actually not a file
- file is actually not an image
- file contains a corrupt image
etc.
are all logical conditions that the application needs to be able to handle.
The difference is that from the software correctness perspective none of these are errors. In the software they're just logical conditions and they are only errors to the USER.
BUGS are errors in the software.
(People often get confused because the term "error" without more context doesn't adequately distinguish between an error condition experienced by the user when using the software and errors in the program itself.)
> The code is allowed to continue after it has logically bugged out.
I'm a big fan of asserting preconditions and making it clear that we are getting into a bad place. I would rather dig through Sentry for an AssertionError than propagate a bad state and having to fix mangled data after the fact. If the AssertionError means that we mishandled valid user input, no problem, we'll go fix it.
A few times in my career I've had to ask, "okay, how long has this bug been quietly mangling user data?" and it's not a fun place to be.
Side note: I've never understood the convention of removing asserts in production builds. It seems like removing the seatbelts from the car before the race just to save a few pounds.
> Side note: I've never understood the convention of removing asserts in production builds. It seems like removing the seatbelts from the car before the race just to save a few pounds.
Once an upon a time computers were slow and every cycle mattered. Assertions were compiled out of the build by necessity. Better a crash once in a while than the program hardly running because it was so slow.
Case in point: When I was stuck inside a "big ball of perl" codebase that heavily used assertions for method input validation, I generated a flame graph of where time was spent in the codebase and it turned out it was assertions all the way down. Since only a small percentage of inputs came from external/unvalidated sources (user input etc) it was fine to remove the vast majority of them outside of the development environment. So we turned them into no-ops in prod and had a significant performance improvement.
If this is some inconsequential part of the codebase it might be better to limp on then to completely stop anyone, user or fellow dev, from running the app at all.
Said another way, graceful degradation is a thing.
I think this is precisely why exceptions model particularly well - well - exceptional situations.
They let you install barriers, and you can safely fail up until that point, disallowing the program from entering cursed states, all the while a user can be returned a readable error message.
In fact, I would be interested in more research into transactions/transactional memory.
How do you gracefully degrade when your program is in a buggy state and you no longer know what data is valid, what is garbage and what conditions hold ?
If I told you to write a function that takes a chunk of customer JSON data but I told you that the data was produced / processed by some code that is buggy and it might have corrupted the data and your job is to write a function that works on that data how would you do it?
Now your answer is likely to be "just check sum it", but what if i told you that the functions that compute the check sums sometimes go off rails in buggy branches and produce incorrect checksums.
Then what?
In a sane world your software is always well defined state. This means buggy conditions cannot be let to execute. If you don't honor this you have no chance of correct program.
Contrary to people's dislike of OOP, I think it pretty well solves the problem.
You have objects, and calling a method on it may fail with an exception. If the method throws an exception, it itself is responsible for leaving behind a sane state, but due to encapsulation it is a feasible task.
(Of course global state may still end up in illegal states, but if the program architecture is carefully designed and implemented it can be largely mitigated)
Why not bring down the entire server if you detect an error condition in your application? You build things in a way where a job or request has isolated resources, and if you detect an error, you abort the job and free those resources, but continue processing other jobs. Operating systems do this through processes with different memory maps. Applications can do it through things like arenas or garbage collection.
It may be okay in a server, but (for example) not in a desktop application. The issue, then, is that most code lives (or should live) in library-like modules that are agnostic of which kind of application context they are running in. In other words, you can’t just abort in library code, because the library might be used in application contexts for which this is not acceptable. And arguably almost all important code should be a library.
Exception mechanisms let the calling context control how to proceed. Deferring to that control and doing some cleanup during stack unwinding virtually never causes serious issues in practice.
What I meant was that if you follow the logic of "computer is in an unknown state. Stop processing everything", then why not continue that to the entire server (operating system, hypervisor, etc.)? Obviously it's not okay in almost any context. Instead, assuming you have something more complicated than a CLI script that's going to immediately exit anyway, you should be handling those sorts of conditions and allowing your event loop/main thread to continue.
I think the issue is that bringing the application down might mean cutting short concurrent ongoing requests, especially requests that will result in data mutation of some sort.
Otherwise, some situations simply don't warrant a full shutdown, and it might be okay to run the application in degraded mode.
"I think the issue is that bringing the application down might mean cutting short concurrent ongoing requests, especially requests that will result in data mutation of some sort."
Yes but what is worse is silently corrupting the data or the state because of running in buggy state.
> If you don't know what the state of your app is, how do you prevent data corruption or logical errors in further execution?
There are a lot of patterns for this. Its perfectly fine and often desirable to scope the blast radius of an error short of crashing everything.
OSes shouldn't crash because a process had an error. Servers shouldn't crash because a request had an error. Missing textures shouldn't crash your game. Cars shouldn't crash because the infotainment system had an error.
If you can actually isolate state well enough, and code every isolated component in a way that assumes that all state external to it is untrusted, sure.
This is basically all code I've worked on. You have a parsing/validation layer that passes data to your logic layer. I could imagine it working less well for something like a game where your state lives longer than 2 ms and an external database is too slow, but for application servers that manipulate database entries or whatever it's completely normal.
In most real-world application programming languages (i.e. not C and C++), you don't really have the ability to access arbitrary memory, so if you know you never gave task B a reference to task A or its resources, then you know task B couldn't possibly interfere with task A. It's not dissimilar to two processes being unable to interfere with each other when they have different logical address spaces. If B does something odd, you just abort it and continue with A. In something like an application server, it is completely normal for requests to have minimal shared state internal to the application (e.g. a connection pool might be the only shared object, and has a relatively small boundary that doesn't allow its clients to directly manipulate its own internals).
Sure. You wouldn't want a webserver to crash if someone sends a malformed request.
I'd have to think long and hard about each individual case of running in degraded mode though. Sometimes that's appropriate: an OS kernel should keep going if someone unplugs a keyboard. Other times it's not: it may be better for a database to fail than to return the wrong set of rows because of a storage error.
Most bugs aren't going to create any risk for data exfiltration. In most real application servers (which are very rarely written in C or C++ these days), requests are almost completely isolated from each other except to the extent that they interact with a database. If you detect a bug in one request, you just abort the one request, and there's likely no way it could affect others.
This is part of why something like Rust is usable at all; in the real world a lot of logic has straightforward, linear lifecycles. To the extent that it doesn't, you can push the long-lived state into something like an external database, and now your application has straightforward lifecycles again where the goal of a task is to produce commands to manipulate the database and then exit.
Except you usually can because the state isn't completely unknown. You might not expect some field in a structure to be null, but you still know for example that there's no way for one request to have a reference to another, so you just abort the one request and continue.
I largely agree. If it came to pass that the precondition fails, there's a bug somewhere and this code just hides it. At the very least, that should go to an error log that someone actually sees.
I'm writing a Rust project right now where I deliberately put almost no error handling in the core of the code apart from the bits accepting user input. In Rust speak, I use .unwrap() all over the place when fetching a mandatory row from the DB or loading config files or opening a network connection to listen on or writing to stdout. If any of those things fail, there's not a thing I can plausibly do to recover from it in this context. I suppose I could write code like
if let Ok(cfg) = load_config() {
println!("Loaded the config without failing!");
Ok(cfg);
}
else {
eprintln!("Oh no! Couldn't load the config file!";
Err("Couldn't load the config file");
}
and make the program exit if it returns an error, but that's just adding noise around:
return load_config().unwrap();
The only advantage is that the error message is more gentle, at the expense of adding a bunch of code and potentially hiding the underlying error message from the user so that they could fix it.
I think Python also gets that right, where it's common to raise exceptions when exceptional things happen, and only ever handle the exceptions you can actually do something about. In 99.999% of projects, what are you actually going to do at the application level to properly deal with an OOM or disk full error? Nothing. It's almost always better to just crash and let the OS / daemon manager / top level event loop log that something bad happened and schedule a retry.
- errors in the software itself, aka BUGS
- logical conditions that are expected part of the program execution flow and expected state. some of these might be error conditions but only for the *user*. In other words they're not errors in the software itself.
- unexpected failures where none of the above applies. typically only when some OS resource allocation fails, failed to allocate memory, socket, mutex etc and the reason is not because the programmer called the API wrong.
In the first category we're dealing with BUGS and when I advocate asserting and terminating the process that only really applies to BUG conditions. If you let an application to continue in a buggy state then you cannot logically reason about it anymore.
The logical conditions are the typical cases for example "file not found" or whatever. User tries to use the software but there's a problem. The application needs to deal with these but from the software correctness perspective there's no error. The error is only what the user perceives. When your browser prints "404" or "no internet connection" the software works correctly. The error is only from the user perspective.
Finally the last category are those unexpected situations where something that should not fails. It is quite tricky to get these right. Is straight up exiting the right choice? Maybe the system will have more sources later if you just back off and try again later. Personally in C++ projects my strategy is to employ exceptions and let the callstack unwind to the UI level, inform the user and and then just return to the event loop. Of course the real trick is to keep the program state such that it's in some well defined state and not in a BUGGY state ;-)
When a process is used to serve multiple requests, I don't think you need to let the whole process terminate just because there is a bug dealing with a single request.
Just because we can not reason about the current request does not mean the only way to get to the clean state for other requests is to terminate the whole process.
First of all, this results in unintelligible errors. Linux is famous for abysmal error reporting, where no matter what the problem really is, you get something like ENOENT, no context, no explanation. Errors need to propagate upwards and allow the handling code to reinterpret them in the context of the work it was doing. Otherwise, for the user they are either meaningless or dangerous.
Secondly, any particular function that encounters an unexpected condition doesn't have a "moral right" to terminate the entire program (who knows how many layers there are on top of what this particular function does?) Perhaps the fact that a function cannot handle a particular condition is entirely expected, and the level above this function is completely prepared to deal with the problem: insufficient permissions to access the file -- ask user to elevate permission; configuration file is missing in ~/.config? -- perhaps it's in /etc/? cannot navigate to URL? -- perhaps the user needs to connect to Wi-Fi network? And so on.
What I do see in practice, is that programmers are usually incapable of describing errors in a useful way, and are very reluctant to write code that automates error recovery, even if it's entirely within reach. I think, the reason for this is that the acceptance criteria for code usually emphasizes the "good path", and because usually multiple bad things can happen down the "bad path", it becomes cumbersome and tiresome to describe and respond to the bad things, and then it's seldom done.
There's a lot of comments here that seem overly critical. The author came up with solutions to extend Go's errors to meet their needs and shared that with the world- thank you.
I have been solving all the same problems and providing libraries that allow for more flexibility so that users can come up with approaches that best meet their needs. I am finally polishing the libraries and starting to write about them:
The concepts aren't wrong (structured logs from structured errors), but I find this code to be very un-go-like and there are obvious signs of trying to write java in go (iFace, structs with one property "because everything needs to be contained in an object", and others).
Return "error" and not a custom type "mypkg.Error" - you run into more nil interface pointer problems and you are breaking an idiom.
Let me provide a counter example for helping create structured logs from structured errors that I wrote up that is much more idiomatic if not more narrowly focused:
As in the article, if you want to attach "username: foo", this package lets you return kverr.New(err, "username", foo, ...), and then extract a slice or map later for logging like logger.WithArgs(YoinkArgs(err)...).
Feels like OP is basically implementing exceptions and exception handling at the application level. If this is what you want, then why not just switch to one of the many other languages that has exceptions built in at the language level?
I think they use too many sentinel errors [0] I have been doing Java for two decades, and I thought you need to handle individual errors by type. Using Go, I've learned from the code I write, 90%+ of errors I don't need to handle individually, or I can't do anything except bubble an error up. There is the rare case (10%) when a file does not exist, and I try to read an alternative one and I don't bubble up an error.
For customer support I also found it much easier, instead of an error number, print a UUID that customers can give to support, and that UUID (Request ID) then can be found in the logs to find out what happened by developers.
Exceptions are easier for the programmer. The programmer has to write less and they clutter the code less. But exceptions require stack traces. An exception without a stack trace is useless. The problem with stack traces is: they are hard to read for non-programmers.
On the other side Go's errors are more work for the programmer and they clutter the code. But if you consequently wrap errors in Go, you do not need stack traces any more. And the advantage of wrapped errors with descriptive error messages is: they are much easier to read for non-programmers.
If you want to please the dev-team: use exceptions and stack traces.
If you want to please the op-team: use wrapped errors with descriptive messages.
Messages and stack traces in the error are orthogonal to errors-as-values vs. exceptions for control flow. You could have `throw Exception("error fooing the bar", ctx)`. You could also `return error("error fooing the bar", ctx, stacktrace())`. Stack traces are also occasionally useful but not really necessary most of the time IME.
Go's error handling is annoying because it requires boilerplate to make structured errors and gives you string formatting as the default path for easy-to-create error values. And the whole using a product instead of a sum thing of course. And no good story for exception-like behavior across goroutines. And you still need to deal with panics anyway for things like nil pointers or invalid array offsets.
Go messages are harder for both devs and users to read. Grepping for an error message in a codebase is a special hell.
Besides, it's quite trivial to simply return the exception's getMessage in a popup for an okay-ish error message (but writing a stacktrace prettifier that writes out the caused by exception's message as well is trivial, and you can install exception handlers at an appropriate level, unlike the inexpensibility of error values)
I tend to use "catch and re-raise with context" in Python so that unexpected errors can be wrapped with a context message for debugging and for users, then passed to higher levels to generate a stack trace with context.
For situations where an unexpected error is retried, eg, accessing some network service, unexpected errors have a compressed stack trace string included with the context error message. The compressed stack trace has the program commit id, Python source file names (not pathnames) and line numbers strung together, and a context error message, like:
[#3271 a 25 b 75 c 14] Error accessing server xyz; http status 525
Then the user gets an idea of what went wrong, doesn't get overwhelmed with a lot of irrelevant (to them) debugging info, and if the error is reported, it's easy to tell what version of the program is running and exactly where and usually why the error occurred.
One of the big reasons I haven't switched from Python to Go for HashBackup (I'm the author) is that while I'd love to have a code speed-up, I can't stomach the work involved to add 'if err return err("blah")' after most lines of existing code. It would hugely (IMO) bloat the existing codebase.
When there's an exceptional case, it's better to handle that explicitly. I think Rust does that best with its single-character ? operator, but I don't want exceptions invisibly breaking out of control flow unless I give them permission to. `if err != nil` is a fair enough way of doing that.
Yeah there are linters that force you not to implicitly discard errors, but that should really be a compiler error. Still, that's not a problem inherent to the Go's error-handling model.
Better is subjective, but I prefer errors as return values because then the function signature states whether an error has to be handled or not. Exceptions can be forgotten about, but returned errors have to be explicitly ignored.
People bitch about checked exceptions in Java but this is precisely why I think they're a great idea. You can't forget to catch the right type of exception.
The biggest issue with checked exceptions in modern Java is that even the Java makers themselves have abandoned them. They don't work well with any of the fancy features, like Streams.
Checked Exceptions are nothing but errors as return values plus some syntactic sugar to support the most common response to errors, bubbling.
All it would require is more support for sum types and variadic type parameters, and maybe fix some hiccups in the existing type inference. You can already write a Stream-like API that supports up to a fixed number of exception types (it’s just a bit annoying to write). The main issue at present is that you can’t do it for an open-ended number of exception types and abstract over the concrete set of types.
The throws clause would require union types, not sum types though (you can observe it in the catch part of a try catch, e.g. `catch ExceptionA | ExceptionB`. But java can't support unions elsewhere, it will have to be replaced by the two exceptions' common supertype.
That would be true if not for Java making the critical mistake of excluding RuntimeException from method definitions, so in-practice people just extend RuntimeException to keep their methods looking "clean".
The problem is that there's no way to specify an exception specification like "I propagate everything that this lambda throws" (or, for generics, "that method M of class C throws").
interface Func<P, R, X extends Exception>
{
R func(P param) throws X;
}
or the same with more than one exception type, and convert your lambda to that. This works. The only problem is that you can’t abstract over an arbitrary number of exception types.
In principle, one could imagine a syntax for variadic type parameters like
That was part of the idea behind them yes, as many things in WG21 design process, reality worked out differently, and they are no longer part of ISO C++ since C++17.
Although some want to reuse the syntax for value type exceptions, if that proposal ever moves forward, which seems unlikely.
No, but you can easily end up missing some because somebody wrapped them in some sub-type of RuntimeException because they were forced(!) to. This happens all the time because the variance on throws clauses it at odds with the variance of method signatures (well, implementations, really -- see below).
A new implementation of a ThingDoer usually needs to do something more/different from a StandardThingDoer... and so may need to throw more types of exceptions. So you end up having to wrap exceptions ... but now they don't get caught by, say, catch(IOException exc). If you're lucky you own the ThingDoer interface, but now you have a different problem: It's only JDBCThingDoer which can throw SQLException, so why does code which only uses a StandardThingDoer (via the ThingDoer interface) need to concern itself with SQLException?
Checked exceptions in Java are worse than useless -- they actively make things worse than if there were only unchecked exceptions. (Because they sometimes force the unavoidable wrapping -- which every place where exceptions are caught needs to deal with somehow... which no help from the standard "catch" syntax.)
One thing you can do in Java is parameterise your interface on the exception type. That way, if the implementation finds it needs to handle some random exception, you can expose that through the interface -- e.g. "class JDBCThingDoer implements ThingDoer<SQLException>". Helper classes and functions can work with the generic type, e.g. "<E> ThingDoer<E> thingDoerLoggingWrapper(ThingDoer<E> impl)".
I think this works really well to keep a codebase with checked exceptions tractable. I've always been surprised that I never saw it used very often. Anyone have any experience using that style?
I guess it's not very relevant any more because checked exceptions are sadly out of fashion everywhere. I haven't done any serious Java for a while so I'm not on top of current trends there.
Back when Java didn't have lambdas, one of the more advanced lambda proposals (http://www.javac.info/closures-v06a.html) had this exact thing for this exact reason.
Unfortunately, this take on lambdas was deemed too complicated, and so we got the present system which doesn't really try to deal with this use case at all.
Well, scala has union types, but it doesn't do checked exceptions per say (but it does have a very advanced type system so similar structure can be easily encoded). I think checked exceptions is pretty rare, so I don't know.. probably some research language (but they often go the extra mile towards effect types)
My main gripe with checked exceptions is they create a whole other possible code path on each `catch` clause. I tend to keep checked exceptions to the absolute minimum where they actually make sense, all the rest are RuntimeExceptions that should bubble up the stack.
But so would every single other method to react to different types of errors, no?
In something like go, you're even required to create the separate code path for EVERY SINGLE erroring line, even if your intention is simply to bubble it up.
After a decade of Scala and Rust I no longer believe in monads and prefer the way Go does error handling.
for a <- a()
b <- b() {
return a + b
}
looks nice but only by hiding error handling. Today I like looking at code and see the error handling. With monads you end up with monad stacks and transformers which introduce their own failure states.
It's certainly better than Go's (Go's is barely better than C's and that's quite a low bar), but I don't think that sum types are the global optimum.
Exceptions are arguably better from certain aspects, e.g. defaulting to bubbling up, covering as small or wide range as needed (via try-catch blocks), and auto-unwrapping without plus syntax. So when languages with proper effect types come into mainstream we might reach a higher optimum.
Maybe I'm too pessimistic, but Rust style error handling feels like the global optimum under the constraint that the average developer understand it.
Go is a language that exists purely because people saw Monads in the horizon and, in their panic, went back to monke, programming wise. Rust error handling is something that even many Go fans have said is a good abstraction.
No, sum types are certainly not a global optimum. But they remain the best error-handling mechanism that I've used professionally so far.
Effect types (and effect handlers) are very nice, but they come with their own complexities. We'll see if some mainstream language manages to make them popular.
Which Go doesn't fix either, because their errors are all just "error", aka you can also forget to catch the right type of error.
If only there was a way to combine optimizing the default path (bubbling), and still provide information on what errors exactly could happen. Something like a "?" operator and a Result monad...
You may be thinking a bit too much about what happens in _Go_ when you forget to check for an error response from a function -- the current function continues on with (probably) incorrect/nil values being fed to subsequent code. In Java when an uncaught exception is thrown, the exception makes its way back up the call stack until it's finally caught, meaning subsequent code is _not_ executed. It's actually a very orderly termination. In any Java web framework (Spring et al) there's always a centralized point at which exceptions are caught and either built-in or user-specified code is used to translate the error to an HTTP response.
This makes for much more pleasant code that is mostly only concerned with the happy path, e.g., my REST endpoint doesn't have to care if an exception is thrown from the DAO layer as the REST endpoint will simply terminate right then and there and the framework will map the exception to a 500 error. Why anyone would prefer Go's `if err != nil {}` error handling that must be added All. Over. The. Place. at every single level of the application is beyond me.
My slightly snarky take is that liking Go is simply a defensive reaction to one too many AbstractFactoryBeanFactory. Too many abstractions overloaded their "abstraction-insulin", so now they can only handle minute amounts of abstraction.
No, TFA is mostly about making errors consistent in a large application, while exception (vs error as standard return value) is largely about easier bubbling, which is one thing TFA hardly talked about (maybe I missed it, I only skimmed the article). In fact it spends a lot of energy on wrapping which is the opposite of automatic bubbling provided by exceptions by default. Throwing random, inconsistent module/package/whatever-specific exceptions from everywhere causes most of the same problems described in TFA.
I feel like all the canned comments saying TFA is about implementing exceptions / ADT result type are from people who didn’t read the article and just want to repeat all the cliche on the topic (for easy karma? No idea what’s the point).
That's not how I read it. It's more about having a consistent approach to managing error types in large code bases. This is a common problem with exception-based languages too.
How so? This is about how errors are defined, not how they're propagated through the application. Feels like you didn't actually read what was being done by the OP.
Exceptions have a hierarchical nature to them in most languages, or at least have some sort of identity to them. Your correct that the author doesn't try to change the way errors are propagated, but you can see similarities between what the author is creating themselves, and what already exists in languages with exceptions.
Go’s error handling is still cumbersome and lacking. I love writing Go but I don’t want to ever adopt anything like. It’s bending over backwards to achieve something sum types provide and this pattern is a mess.
I thought so too, after years with Scala and Rust. Now I think (X, error) is fine, indeed I think it is great for it's simplicity. I might want to have a safe assignment
// x() (X,error)
x != x()
// x is X
// return on error
The problem is indeed composition. How do I chain 3 calls that short-circuit on the first error? In Go that's verbose in the extreme. With exceptions it's easy to miss an error. Sum type errors have neither problem.
If you want to chain 3 calls and short circuit on the first error, don't use Go. I like explicit code without magic that I can't see.
for a <- a()
b <- b() {
...
}
I don't know what is happening there. Is it summing up errors? Is if short circuiting? Does it have error handling? Is it async? Is it a monad stack with transformers? That code could mean anything. Good luck coming back to that code after six months. I think the Sum type solution focuses on the happy path, the Go solution assumes you need to focus on the things that go wrong.
Additionally you have that nasty dependency of the Result type of a() to b() to make it work. And I've spent hours creating the right Transformer stack to compose more than two monad types like Result, Future and IO.
Checked Exceptions are nothing but errors as values with some syntactic sugar for the most common use case (bubbling up the error).
Gos version of value errors is just micrometers ahead of C style error codes. In both cases you get told "there could be an error", the error is a value of one single type (error/int), and you have to manually find out which different errors this value could represent.
If you want to know what you're missing, check out Rusts error handling.
I’m not talking about the underlying model, I’m talking about the control flow. What I mean is errors are explicit values belonging to the signature of the functions.
Go has insanely good tooling and very fast single binary compiling.
While all these languages (afaik) can reach similar levels of functionality (GraalVM e.g.), it's more work. As much as I hate the language Go, I can't deny how braindead simple it is to just make a tool with it. I don't need to choose a build tool, or a runtime version, there's a library for everything and most developers with more than a room temp IQ can immediately start working on it.
The only other language that currently comes close is Rust. If only they had stuck to using a GC, I'd be in heaven.
Trying to shoehorn code errors into HTTP errors is a prime example of conflating two very different things because sometimes they look similar. Let different things be different, I like to say. You either let your HTTP handlers do their own error-to-http-code management or you end up with a massive switch statement trying to map them all, or whatever monstrosity this approach is.
Also the entire problem of the OP would go away if they just implemented opentelemetry tracing to their logs.
The sane thing to do is to let lower layer functions return
Either<Error, Foo>
then, in the HTTP layer, your endpoint code just looks like
return fooService
.getFoo(
fooId
) //Either<Error, Foo>
.fold({ //left (error) case
when (it) {
is GetFooErrors.NotAuthenticated -> { Response.status(401).build() }
is GetFooErrors.InalidFooId -> { Response.status(400).build() }
is GetFooErrors.Other -> { Response.status(500).build() }
}
}, { //right case
Response.ok(it)
})
the benefits of this are hard to overstate.
* Errors are clearly enumerated in a single place.
* Errors are clearly separated from but connected to HTTP, in the appropriate layer. (=the HTTP layer) Developers can tell from a glance at the resource method what the endpoint will return for each outcome, in context.
* Errors are guaranteed to be exhaustively mapped because Kotlin enforces that sealed classes are exhaustively mapped at compile time. So a 500 resulting from forgetting to catch a ReusedPasswordException is impossible, and if new errors are added without being mapped to HTTP responses, the compiler will let us know.
It beats exceptions, it beats Go's shitty error handling.
My favorite example of this was renaming a 500 error due to an unhandled exception to a 400 error to make it look like it was the error of the caller. Management was also possibly tracking 500 errors too, so the 400 could also have been gaming the system.
In some mental models, though, it did make sense. Particularly the one that went, "Well, we never would have errored, if you never called us!"
It's somewhat fair though. If there's a case that would cause errors for the system and it's a case that you're not supposed to handle, then a 400 error sounds perfect for that case. For example, if you have a service and it panics/returns 500 when you pass in an empty user id, then you could instead return a 400 before you hit the panic and all is good.
Normally you should attempt to find all the corner cases and present the errors to the user -- before processing the request. If you can't do this, it's time to rethink how your api works. A good api is simple to use and simple to write.
It also simplifies your business logic in that all the possible user defined idiocies are caught before your business logic actually processes the request.
Some frameworks do this better than others. And rather than documentation, I tend to prefer comprehensive error messages.
One example of a 500 error is a null pointer error. Was it a bad request or a logic error? One is your problem the other is not. Just returning a 400 hides that issue. Validating the payload before processing it simplifies the issue for everyone involved.
A 500 error should be your problem with a stack trace in the log. A 400 error should provide enough description to tell the user it's theirs and how to fix it.
Just marking recoding a 500 to a 400 because of a null pointer error would get noticed on a code review and marked up on a code review.
Think about what the client code looks like to handle this and the alternative, particularly if you’re implementing an sdk and the api is an implementation detail. I’m not saying I would choose this path, but it certainly reduces the amount of code on both sides that you have to write.
If HTTP is your API's transport layer, then HTTP errors should be related to problems with the transport layer and not to API itself. Is the internal server error caused by a bad HTTP request or a bad API request?
Honestly, my controversial take is that for APIs, it would be cleaner to not use any HTTP status codes other than 200 and have all of the semantics in the body of the response. I'm sure someone smarter than me will jump in and explain why this wouldn't work in practice, but it just feels like application semantics are leaking from a much more natural location in the body of the response. I feel similarly about HTTP request methods other than POST in APIs; between the endpoint route and the body, there should be more than enough room to express the difference between POST, PATCH, and DELETE without needing them to be encoded as separate HTTP methods.
I'm sympathetic, but this can have issues if you want your API to be used by anything other than your own client, including stuff like logging middleware. A lot of tools inherently support/understand HTTP status codes, so building on top of that can make integration a lot easier.
We, very roughly, do it like this:
- 200: all good
- 401: we don't know who you are
- 403: you're not allowed to do that
- 400: something's wrong and you can fix it
- 500: something's wrong and you can't fix it
Each response (other than 401) includes a json blob with details that our UI can do something with, but any other consumer of the API or HTTP traffic still knows roughly what's going on.
I've worked in places where we really sweated on getting the perfect HTTP status codes, and I'm not sure it added much benefit.
On POST - I find myself doing logical GETs with POST a lot, because the endpoint requires more information than can be conveyed in URL params. It makes me feel dirty, and it's obviously not RESTful but you know - sometimes you just have to get things done.
You've just described basically everything a dev needs to know to implement HTTP APIs that report status codes properly, yet some people still seem to think it's oh so complicated. What has gone wrong?
I can understand how people might look at all the full list status codes and think it's all too hard, but yes, once you realize that there are only a handful you need most of the time it all becomes pretty simple.
Sure, but the problem in my opinion is that while the handful that you pick is totally reasonable, someone else might pick a slightly different handful that's just as reasonable. If I want to use a new API and delete a user, how do I know if it uses DELETE or POST, and if it will return 401 or 403? At best, you'll be able to skim through the documentation more quickly due to having encountered similar conventions before, but nothing stops that from happening in terms of request and response bodies either.
The fact that existing tooling relies on some of these conventions is probably a good enough reason to do things this way, but it's not obvious to me that this is because it's actually better rather than inertia. Conventions could be developed around the body of requests as well, and at least to me, it doesn't seem obvious that the amount of information conveyed at the HTTP method/response status layer was necessary to try to separate from the semantics of the request and response bodies. I'm sure that a part of that was due to HTTP supporting different content types for payloads, but nowadays it seems like quite a lot of the common alternatives to JSON APIs were designed not to even use HTTP (GraphQL, gRPC, etc.), which I'd argue is evidence that HTTP isn't necessarily being used as well for APIs as some people would like.
To make something explicit that I've been alluding to, everything I've said is about using APIs in HTTP, not HTTP in the context of viewing webpages in a browser. It really seems like a lot of the complications in HTTP are due to it trying to be sufficient for both browsers and APIs, and in my opinion this comes mostly at the expense of the latter.
It's quite unclear what's your point. HTTP APIs should have minimal status code set. Parent described it perfectly. It's simple, practical (especially from monitoring perspective) and doesn't intervenes with a service domain.
It seems you have some alternative in mind but it wasn't presented.
I don't consider what the parent comment listed as "minimal". The alternative I described is literally in my initial comment; using only 200 for APIs is "minimal".
Only 200 is detrimental for monitoring. You have to parse response body to classify response types. HTTP status codes is a cheap and already existing way to get insights into service behavior.
Go ahead try to implement something like cross-origin requests or multipart encoded form uploads just using the body semantics you described. I’ll wait.
Also that is not a controversial take. It is at best a naive or inexperienced take.
Both of those happen in the context of web browsing rather than existing in APIs in a vaccuum; I'd argue that there's absolutely no reason why the mechanism used to request a webpage from a browser needed to be identical to the mechanism used for the webpage to perform those actions dynamically, which is pretty much my whole point: it doesn't seem obvious to me that it's useful to encode all of that information in an API that isn't also being used to serve webpages. If you are serving webpages, then it makes sense to use semantics that help with that, but I can't imagine I'm the only one who's had to deal with bikeshedding around this sort of thing in APIs that literally only are used for backends.
There are a lot of useful network monitoring tools that can analyze HTTP response codes out of the box. They can't do this for your custom application error format. You don't have to go crazy with it, but supporting at least 200/400/500 makes it so much easier to monitor the health of your services.
I use http status codes to encode how the _request_ was handled, not necessarily the data within the request.
A 400 if you send mangled JSON, but a 200 if the request was valid but does not pass business validation rules.
Inside the 200 response is structured JSON that also has a status that is relevant at the application level.
Otherwise how can for example you tell if a 404 response is because the endpoint doesn't exist, or because the item requested at the endpoint doesn't exist?
I believe it's important to have a separation between what is happening at the API level vs Application, and this approach caters for both.
As it's not to do with the http request and the body was able to be parsed, in my book that'd be classified as being at the application level, so results in a 200 status with a JSON response detailing the issue
200 OK
{status: "failed", errors: ["field X is required"]}
How you deal with this on the application side, what JSON statuses you have etc is up to you.
That depends on how you set up and do your monitoring. Not every failure needs to be indicated by an HTTP status code.
For example, on a server I'm working on there are helper functions that generate different types of responses. Responding in certain ways will produce a 200, but will also log a warning or error.
On the client side, you can create request helpers that all requests go through and that can resolve requests appropriately, rendering error messages to the user etc.
The main thing is to have a well defined, consistent approach.
One reason for using HTTP verbs is to distinguish between queries and updates, and for the latter, between idempotent and non-idempotent updates. This in turn makes it possible to do things like automatically retry queries on network errors or cache responses where it is safe to do so.
Anecdotally the color codes make life much easier when debugging a new API. You instantly see that's something is wrong. If everything is green you don't realize that something is wrong until you carefully read a uniquely structured custom response. Saves a lot of effort.
> Honestly, my controversial take is that for APIs, it would be cleaner to not use any HTTP status codes other than 200 and have all of the semantics in the body of the response.
We've been doing that for 20 years with json-rpc 1.0
Yeah, that's usually the pragmatic thing to do. Facebook does that with their API, for example.
4xx or 5xx gets you the default HTTP handling for that kind of error. Occasionally - especially in small examples - that default handling does what you want and saves you duplicating a lot of work. More often it gets in your way.
I'd compare it to browser default styling - in small examples it sounds useful, but in a decent-sized site you just end up having to do a "CSS reset" to get it out of the way before you do your styling.
Possibly. I'm not sure why it should require switching to an entirely different protocol though; my point is that making an API that only uses POST and always returns 200 is something that already works in HTTP though, and I have trouble understanding why that isn't enough for pretty much everything.
You lose some benefits of features already implemented by existing HTTP clients (caching, redirection, authorization and authentication, cross-origin protections, understanding the nature of the error to know that this request has failed and you need to try another one...).
It's is certainly not comprehensive, but it's right there and it works.
Moving to your own solution means that you have to reimplement all of this in every client.
> understanding the nature of the error to know that this request has failed and you need to try another one...
Please elaborate. In my experience, most of HTTP client libraries do not automatically retry any requests, and thank goodness for that since they don't, and can't, know whether such retries are safe or even needed.
> redirection
An example of service where, at the higher business logic level, it makes sense to force the underlying HTTP transport level to emit a 301/302 response, would be appreciated. In my experience, this stuff is usually handled in the load-balancing proxy before the actual service, so it's similar to QoS and network management stuff: the application does not care about it, it just uses TCP.
They don't retry on errors but they know it is an error. Eg. imagine a shell script using curl or wget and trying multiple URLs as a health check (eg. on different round-robin IPs). Without these "generic" HTTP tools knowing that this is a "failure", you would need to implement custom parsing for any case like this instead of relying on the defined "error" and "success" behaviour.
The same holds true if you are using any programming library: there is a plethora of handlers for HTTP errors.
As for redirection, a common example is offering downloads through S3 using pre-signed URLs (you share a URL with your own domain, but after auth redirect to a pre-signed S3 URL for direct download or upload).
You are thinking like a developer, but there is a world of networking as well. Between your client and server will be various bits of hardware that cannot speak the language you invent. 200, 401, 500 — these are not for the use of the application developer — but rather the infrastructure engineer.
Something being "enough" doesn't mean it's optimal. There's a huge stack of tools that speak HTTP semantics out of the box; including the user agent, i.e. the browser (and others), but also stuff like monitoring tools, proxies, CORS, automation tools, web scrapers...
You don't need to reinvent HTTP semantics when HTTP is already there, standard, doing the right thing, compatible with millions of programs all across the stack, out of the box.
HTTP is so well designed it almost makes me angry when people try to sidestep it and inevitably end up causing pain in the future due to some subtle semantic detail that HTTP does right and they didn't even think to reimplement.
And the only solution to such issues (as they arise, and they will) is to slowly reimplement HTTP across the whole stack: oh, you need to monitor your internal server errors? Now you have to configure your monitoring tool (or create your own) to inspect all your response bodies (no matter how huge) and parse their JSON (no matter how irrelevant) instead of just monitoring the status code in the response header and easily ignore the expensive body parsing.
Even worse when people go all the way. If we don't need status codes, why do we need URLs at all? Just POST everything to /api/rpc with an `operation` payload. Congrats, none of your monitoring tools can easily calculate request rates by operation without some application-specific configuration (I wish this was a made up scenario).
Just use HTTP ffs. You'd need a very good reason not to use it.
You need some kind of structured way to describe the action to take, what the result is or what the error is. so the client and server can actually parse the data. that's the protocol, whether its something formal like rpc libraries, or "REST"-ish or w/e
json-rpc is probably what your describing over http, maybe if you squint enough graphql too
This is the way to go, pretty much solves, 404 resource not found or route not found. But you will get laughed at by so called architectural dogmatists. Remember we aren't really doing REST, it's just RPC and let's call it that.
Shoehorning http protocols error codes as application error codes, drinking the cool aid and calling it best practice is beyond bizzare.
The error to code in the http handler is the true path. It’s the only place where the context and knowledge is about semantics. In one endpoint if something is not found it can be a proper 404, if its existence is truly optional. In another endpoint the absence might very well qualify as a 500.
Deciding on end user error handling in a low level is making assumptions that cannot be known at the low level. The caller decides how something is going to be handled and presented, not the callee, or you inevitably miss them in important places and silently miscategorize stuff. Far better to have that scenario lead to a 500 (unmapped error, unknown problem) so it can be fixed.
dont think this will scale. errors are part of API. (especially Go mantra errors are values https://go.dev/blog/errors-are-values it is ever more prominent). and each API is responsibility of a service
so unless you deal with infrastructure or standards/protocols layer (say you define what HTTP 500 means or common pattern for URL paths in your API), then better not couple all services. those standards are very minimal and primitive that works for everything, which is opposite what you doing here aggregating all the specifics into single place
I agree Go error handling is unoptimal, but this is simply not the right approach. This essentially turns error handling into a whole other language, almost like how Ginkgo is a separate language for handling tests.
And most languages are lacking this useful error language. You can’t speak if you have no language, so having it must be a good thing.
The only questionable thing here is that this framework is not a part of the main language still, which means near zero adoption. But that train has sailed.
I think that's overkill, most of the time I just bubble errors up and I have very few cases where the error handling depends on the type of error. I guess it's because I don't use errors for things that are recoverable and try to fix them instead inside the given function. An example given here in the thread is reading from a file and if it doesn't work try a backup. Rather than having a function that reads from a file and returns a bunch of different errors I'd just make one with a list argument and then handle the I/O errors inside, and return an "unrecoverable" error otherwise.
For adding context, %w is good enough I find, though as I said I only very sparingly use errors.Is(...). Go isn't a language that's designed around rich error or exception types, and I don't think you should use it like that.
Well, yes, if you're just using errors as error messages, you only need strings and %w. That's usually good enough if you're writing an application.
However, if you're writing a library, chances are that your users want to catch the errors, find out whether the call failed because, say, the remote API is down or because the password is wrong.
Or if you're writing an API, you probably want to return different error codes. If your errors are bubbling, you'll need to somehow `errors.Is`/`errors.As` somewhere.
Yea, but like, when making an HTTP request, a timeout is significantly different from a failure to open a socket from a failure to resolve the hostname from a 429 error. And often it is up to the caller to decide how to handle those situations.
Is this just someone's proposal, or a formal addition to Go, or what?
"All errors must implement the Error interface." That's a step forward.
Rust really has the same error handling as Go - return an error status. But the syntax is cleaner. Rust thrashed around with errors at first. Then things sort of settled down.
At this point, everybody uses Result<UsefulValue, Error>, but "Error" is just a trait that doesn't require much information. And "?" for propagating errors upwards is a huge convenience.
It's probably too late to retrofit "Result" and "?" into Go libraries, although they'd fit the language.
Not at all. Rust has proper sum types, that it can return just like anything else in the language, while Go has a special cased error return slot (one may be tempted to call it an ugly hack), and it can return a value on both, which it does in some standard library calls.
Not at all. Go has an error type, and Go functions have the ability to return zero, one, two, or more items, ordered however the developer likes. An error may be among those, as desired, and populated as desired.
Some software also writes to both STDOUT and STDERR.
I know, special cased may have been better worded as "just a convention". My point is, this is not much different than using a thread-local variable, like errno, and adds useless confusion - your return values represent n*m values, while there is only n+m case with proper error semantics.
Re STDERR: but shells don't decide whether a program execution failed on having written to STDERR, but by the returned singular error code.
One of the issues in Go is that if all you ever do if "if err != nil { return err }", you will quickly run in to problems because you will have errors like "open foo: no such file or directory" or "sql: no rows in result set" without a clue where that error came from. Sometimes that's obvious, often it's not.
I'm not sure how Rust handles that? But it's more than just "propagate errors", but more like "propagate errors with the appropriate context for this specific error".
Rust uses the `?` operator to convert between error types which allows for users and libraries to hook in to the error before its returned.
There are a number of helper libraries that provide an extended type erased error type to attach a real stack trace to the error, such as `anyhow`. These helper libraries also provide ways to attach extra metadata to the error so you can do things like `returns_a_result().context("couldn't do it")?` so you can quickly annotate the error. The standard library is support for this through a `context.Value` like api on the Error trait. The std lib `Error` trait also has functions for find the cause of the error and traverse a collected chain of errors, very similar to go's `errors.Cause` api.
Rust also has a number of libraries for making specific error types like `thiserror` which can help generate error enums with the implementations required to carry backtraces, context and causes.
Yep, if you want wrapped errors in Rust, you use the anyhow crate. It leans heavily into dyn so has some performance tradeoffs, but it's roughly the same performance-wise as Go's error interface (which also uses a vtable under the hood).
I have been seeing this pattern repeated over and over since I started using Go in 2014 where people think they should be “building my favorite missing feature” — whether that’s futures, generics, structural processes, OTP, version managers, package managers, or now apparently exceptions. I always get the sense that the authors think they’ve done something cool and helpful when in the first place if they had simply put more effort into comprehending the simple “Go way” it wouldn’t have been necessary at all, and the needed functionality would have fallen out of the design.
You realize that have of the features you are counting are now in Go while missing in the beginning exactly because people were missing them and Go simply did not offer a sane way to work around the missing features?
I'm also quite sure that Go will provide a more sane way to handle errors in the not so far future, since it's continuously at the top of people's complaints
your comment exemplifies the mentality, yes, and unfortunately it has now been adopted by project leadership, so I’m sure you are quite right that more “missing features” will get baked into the language soon :)
It's far better to have those features well-designed and baked into the language once, then to have them constantly poorly redesigned and baked into every other Go app.
nobody would ever use this argument for the design of C. It’s good for C to stay lean and simple while communities using C (please let’s not with this imaginary monolithic “The Community”) are free to try things and offer competing solutions that others are free to ignore.
kitchen sink languages are bad. Justifying them with “well the community is bad, so we need the bad thing to be mainlined” is maybe worse
By Go standard, all other languages are "kitchen sink". Conversely, I would argue that basics like decent error handling are not in any meaningful sense a "kitchen sink" thing.
I arrived to a similar conclusion. I come from Java and in Java you have exceptions with TryCatch clauses and declaring them in function signatures. It works fairly well but very difficult and not idiomatic to Golang.
Therefor, I created a simple rule. If you do not know what this error means to the user yet then let it stay a fmt.errorf("xx:%w",err). If you do, wrap it in your own custom ServerError struct and return that type from now on. Do not change the meaning of ServerError even if you wrap the Error with another ServerError.
When I thought about errors/exceptions, I basically came to the same conclusion. To reiterate or add to tfa: standard formulations, expected vs. happened, reasonable context visible in logs, error trees, automatic http/etc codes, tidy client messages in prod, reasonable distinction between: unexpected, semi-normal, programming error, likely fatal.
Not sure why most (all?) programming languages have such poor support for errors. Coding may feel like 2024, but error handling like 1980. Anyone with 2-5 years of any programming experience (in where errors do happen and they choose to handle them) will come to similar ideas.
Also the fact that try {} and catch/finally {} are always three different scopes is just idiotic. It should be try {catch{} finally{}}, what in the cargo cult that {}{}{} is? Everyone copies it blindly from grammar to grammar.
Posts like these remind me how go really has nothing going for it apart from goroutines and channels. It's awkward mix of low level and high level with C like influence, which is weird considering it's a GC language.
This approach is so bad, I don't even know where to start. But it's all symptoms of their, sorry, incompetence. Take the loadCredentials example on top. If os.ReadFile cannot find the file, it returns an error with string representation: "open cred.json: no such file or directory". This comes straight from the std lib as it is, a great error. What does the errors.Is(err, os.ErrNotExist) do: prepend "file not found" to it, rendering: "file not found: open cred.json: no such file or directory". So this adds exactly nothing. The next if will prepend "failed to read file" to it, again, adding nothing as well. The two errors checks should be replaced by one if statement, optionally wrapping it with a context string but I cannot think of any use. Then the next step, error handling of verifyCredentials. I can only guess what it does, but assume that it returns an "username 'foo bar' cannot contain spaces" error. Does prepending "invalid credentials" help anything? Nope, so the whole if can be removed as well. No surprise your errors get clunky if you make them clunky.
I have more pressing things to do than dissect this article line by line, but let me suffice that I feel sorry for newcomers to the language that an article like this is so high on HN. Back in the days there was just Dave Cheney's material to read [1], and it was excellent. It's unfortunately outdated in certain regards (e.g. with new Is/As functionality in the errors package for inspection and the %w formatting directive in fmt.Errorf) but it's still an excellent article.
>it returns an error with string representation: "open cred.json: no such file or directory". This comes straight from the std lib as it is, a great error.
It’s a terrible error. It’s not structured, so you can’t aggregate it effectively in logs, on top of that it leaks potential secret, so you can’t return it from RPC handler.
The string representation is obviously not structured, because it's a string representation and strings are scalars. The typed representation is structured, which you can put into your structured logs as you'd like, omitting sensitive information where needed.
> New Go users: most of the time returning an error without checking its value or adding extra context is the right thing to do
Thank you.
Feels like Go is having its Java moment: lots of people started using it, so questions of practice arise despite the language aiming at simplicity, leading to the proliferation of questionable advice by people who can't recognize it as such. The next phase of this is the belief that the std library is somehow inadequate even for tiny prototypes because people have it beaten over their heads that "everybody" uses SuperUltraLogger now, so it becomes orthodox to pull that dependency in without questioning it.
After a bunch of iterations of this cycle, you're now far away from simplicity the language was meant to create. And the users created this situation.
Go is having a Go moment: lots of people using it are realizing that other programming languages have all that complexity for a reason, and that "aiming at simplicity" by aggressively removing or ignoring well-established language features often results in more complicated code that's easier to get wrong and harder to reason about.
From my experience this is not the case. If you error out 7 functions deep and only return the original error there's no chance you're figuring out where it happened. Adding context on several levels is basically a simplified stack trace which lets you quickly find the source of the error.
I inherited a codebase with the same problem. After a few debugging sessions where it wasn't clear where the error was coming from, I decided the root problem was that we didn't have stack traces.
Fortunately, the code was already using zap and it had a method for doing exactly that:
Because most of the time if there's an error, you'd likely want to log it out. Much of the code was doing this already, so it made sense to ensure we had good stack traces.
There's overhead to this, but in our codebase there was a dearth of logging so it didn't matter much. Now when things are captured we know exactly where it happened without having to do what the post is doing manually... adding stack info.
We actually went through the same realization when we started writing Rust a few years ago. The `thiserror` crate makes it easy to just wrap and return an error from some third-party library, like:
But if that's happening somewhere deep in your application and you call that function from more than one place, good luck figuring out what it is! You wind up with an error log like `third_party thing failed` and that's it.
Generally, we now use structured error types with context fields, which adds some verbosity as specifying a context becomes required, but it's a lot more useful in error logs. Our approach was significantly inspired by this post from Sabrina Jewson: https://sabrinajewson.org/blog/errors
It's not a binary decision though. Just because the article arrives at overkill for most things in my opinion doesn't mean sentinel errors or wrapping errors in custom types should be avoided at all costs in all situations.
In my experience, it's good and healthy to introduce this additional context on the boundaries of more complex systems (like a database, or something accessing an external API and such), especially if other code wants to behave differently based on the errors returned (using errors.Is/errors.As).
But it's completely not necessary for every single plumping function starts inspecting and wrapping all errors it encounters, especially if it cannot make a decision on these errors or provide better context.
Do you maybe have a constructive advice for people that need to return errors that demand different behaviour from the calling code?
I gave an example higher in the thread: if searching for the entity that owns the creds.json files fails, we want to return a 404 HTTP error, but if creds.json itself is missing, we want a 401 HTTP error. What would be the idiomatic way of achieving this in your opinion?
With some of these examples, I'd change the API of the lower-level methods. Instead of a (Credentials, err) and the err is a NotFound sometimes, I'd rather make it a (*Credentials, bool, err) so you can have a (creds, found, err), and err would be used for actual errors like "File not found"/"File unreadable"/...
But other than that, there is nothing wrong with having sentinel errors or custom error types on your subsystem / module boundaries, like ErrCredentialsNotFetched, ErrUserNotFound, ErrFileInvalid and such. That's just good abstraction.
The main worry is: How many errors do you actually need, and how many functions need to mess about with the errors going around? More error types mean harder maintenance in the future because code will rely on those. Many plumbing or workflow functions probably should just hand the errors upwards because they can't do much about it anyways.
A lot of the details in the errors of the article very much feel like business logic and API design is getting conflated with the error framework.
Is "Cannot edit a whatsapp message template more than 24 hours" or "the users account is locked" really an error like "cannot open creds.json: permission denied" or "cannot query database: connection refused"? You can create working code like that, but I can also use exceptions for control flow. I'd expect these things to come from some OpenAPI spec and some controller-code make this decision in an if statement.
Use errors.Is and compare to the returned err to mypkg.ErrOwnerNotExists and mypkg.ErrMissingConfig and the handler decides which status code is appropriate
Cool, but error.Is what? In my case would both come as a os.NotExist errors because both are files on the disk.
I think that the original dismissal I replied to, might not have taken into account some of the complexities that OP most likely has given thought to and made decisions accordingly. Among those there's the need to extract or append the additional information OP seems to require (request id, tracking information, etc). Maybe it can be done all at the top level, but maybe not, maybe some come from deeper in the stack and need to be passed upwards.
no no no; do not return os.NotExists in both cases. The function needs to handle os.NotExists and then return mypkg.ErrOwnerNotExists or mypkg.ErrMissingConfig (or whatever names) depending on the state in the function.
The os.NotExists error is an implementation detail that is not important to callers. Callers shouldn't care about files on disk as that is leaking abstraction info. What if the function decides to move those configs to s3? Then callers have to update to handle s3 errors? No way. Return errors specific to your function that abstract the underlying implementation.
Second edit: same code, but leveraging my other comment's kverr package to propagate context like kv pairs up the stack for logging:
https://go.dev/play/p/pSk3s0Roysm
Exactly, and that's what OP argues for, albeit in a very complex manner.
Distilling their implementation to the basics, that's what we get: typed errors that wrap the Go standard library's ones with custom logic. Frankly I doubt that the API your library exposes (kv maps) vs OPs typed structs, is better. Maybe their main issue is relying on stuffing all error types in the same module, instead of having each independent app coming up with their own, but probably that's because they need the behaviour for handling those errors at the top of the calling stack is uniform and has only one implementation.
A quick back of the napkin list for what an error needs to contain to be useful in a post execution debugging context would be:
* calling stack
* traceability info like (request id, trace id, etc)
* data for the handling code to make meaningful distinction about how to handle the error
I think your library could be used for the last two, but I don't know how you store calling stack in kv pairs without some serious handwaving. Also kv is unreliable because it's not compile time checked to match at both ends.
I'm not saying use kverr for explicit error handling (like, you could, but that is non ideal), use kverr as a context bag of data you want to capture in a log. If you programmatically are routing with untyped string data, I agree, unreliable
> No surprise your errors get clunky if you make them clunky.
From a user perspective, good errors in go make me think or Perls croak/carp. Croak and carp gave you a stacktrace of your error, but it cut out all the module-internal calls and left you with the function calls across module boundaries. Very useful - enough so that Java discovered it again later on.
Personally, I wouldn't wrap the errors in loadCredentials at all. I'd just wrap the result of this method into an fmt.Errorf("failed to load credentials: %w"). This way the user knows the context the error happened in, and then we have to cross our fingers the error returned by this is good enough.
But something like "application startup failed: failed to load credentials: open cred.json: no such file or directory" is a very nice error message from an application. Just enough context to know what's going on, but no 1200 line stacktrace to sift through.
As someone that ended up implementing something very similar to TFA, I'd like to ask in which way can you pass errors from 3 layers deep in your stack to the top layer and maintain context?
Ie, when I can't find cred.json I want to return a 401 error, but when I can't find the entity cred.json is supposed to be owned by I want to return 404. How can one "not incompetent" Go developer solve this and distinguish between the two errors?
Adding error checks everywhere when you don't care about them is one of the ugliest things about Go.
What I do is have a utility package that lets me panic on most errors, so I can recover in a generalized handler.
x, err := doathing()
Catch(err, "didn't do the thing")
The majority of error handling is "the operation failed, so cancel the request." Sure there are places where the error matters and you can divert course, but that is far from the majority of cases.
I don't agree, but having said that, this feels like an entirely predictable/justifiable perspective to hold, given the terrible design of net/http in the standard library. Of course it feels easier to just panic, it's not like you can return an error from a handler. There is so much compatibility baggage from Go 1.0 in that package, that doing the right thing (contexts, errors, etc.) is so much harder than it should be, and most people end up doing the wrong thing because it's more ergonomic.
I usually use Echo which does have an error to return from handlers, but I don't think it's necessarily the wrong thing unless you're writing a library. I used to avoid panics with the same mindset that they aren't supposed to be used like exceptions, but I've found that panics are a clean way to handle a bulk of error cases that are "log and retreat", centralizing the process with some syntactic sugar to not have to check err != nil everywhere. More of my thoughts here if any are curious: https://blog.mukunda.com/cat/2022/dont-be-afraid-to-panic.tx...
I think one thing that could help if the codebase wants to avoid regular panics is more syntactic sugar to help error bubbling, like Rust has.
The fact that this code also has gorm in it in one of the examples is neither supportive of the proposal’s fit for the language, nor really surprising.
I mean their intentions are good but if I worked at a place that made me use that error package I'd not have a good time
In general with golang, if something is not idiomatic Go then don't try too hard to fit constructs from other languages into it. Even the use of lodash like packages feels awkward in Go
The most crucial thing that I've seen over the years is that most developers are simply afraid of bringing the application down on bugs.
They conflate error handling with writing code for bugs and this leads to proliferation of issues and second/third/etc degree issues where the code fails because it already encountered a BUG but the execution was left to continue.
What do I mean in practice? Practical example:
I program mostly in C and C++ and I often see code like this
and the context of the code is such that some_pointer being a NULL pointer is in fact not allowed and is thus a BUG. The right thing to do would be to ABORT the process execution immediately but instead the programmer turned this it into a logical condition. (probably because they were taught to check their pointers).This has the side effect that:
The better way to write this code is: Where ASSERT is a unconditional check that will always (regardless of your build config) abort your process gracefully and produce a) stack trace b) core dump file.My advice is:
If your environment is such that you can immediately abort your process when you hit a BUG you do so. In the long run this will help with post-mortem diagnosis and fixing of bugs and will result in more robust and better quality code base.
I’m a big fan of assertion and rigorous preconditions but there are times when a failure of some invariant in a minor subsystem should not be allowed to crash the entire process, especially if the context makes it easy to return an error.
In our project (the language server for Go) we have gotten tremendous value from telemetry: return an error, but report home the 1-bit fact that the assertion has failed. Often that fact is enough to figure out why; other times it is necessary to refine the assertion into two or more (in a later release) to get another bit or two of information about the nature of the failure.
Please, no!
I worked with a 3rd party library that had this mentality. "A bug is a bug so the assert fails and thus the code is now in an unknown state thus The right thing to do would be to ABORT the process execution immediately". Oh my.
Just do "if (pointer)" and when that fails, error out from the smallest context possible that applies to that pointer, and nothing more than that. I.e. the real BEST thing to do is to abort the current connection. To skip the current file with an error. To fail writing that piece of memory. Whatever. But never abort (unless maybe in debug builds).
The end result of this library was that we had a WebRTC server handling 100s of simultaneous video calls, and then when a single new user tripped up during connection and went through a bogus code path, the library would decide "oh something is not as I expected so I'll abort, of course!" and the whole production server was brought down with it.
That kind of behavior does not help achieving high production quality and providing robust and reliable services.
We ended up removing the library's runtime assertions, which meant that connections that would bug the library code would just end up failing with an error somewhere else, that could be used to just discard the attempt and try again. All in all, numbers showed it was a huge positive in stability for the service.
If you're validating parameters that originate from your program (messages, user input, events, etc), ASSERT and ASSERT often. If you're handling parameters that originate from somewhere else (response from server, request from client, loading a file, etc) - you model every possible version of the data and handle all valid and invalid states.
Why? When you or your coworkers are adding code, the stricter you make your code, the fewer permutations you have to test, the fewer bugs you will have. But, you can't enforce an invariant on a data source that you don't control.
Yes of course the key here is to understand the difference between BUGS and logical (error) conditions.
If I write an image processing application failing to process an image .png when:
are all logical conditions that the application needs to be able to handle.The difference is that from the software correctness perspective none of these are errors. In the software they're just logical conditions and they are only errors to the USER.
BUGS are errors in the software.
(People often get confused because the term "error" without more context doesn't adequately distinguish between an error condition experienced by the user when using the software and errors in the program itself.)
> But, you can't enforce an invariant on a data source that you don't control.
This is obvious.
> The code is allowed to continue after it has logically bugged out.
I'm a big fan of asserting preconditions and making it clear that we are getting into a bad place. I would rather dig through Sentry for an AssertionError than propagate a bad state and having to fix mangled data after the fact. If the AssertionError means that we mishandled valid user input, no problem, we'll go fix it.
A few times in my career I've had to ask, "okay, how long has this bug been quietly mangling user data?" and it's not a fun place to be.
Side note: I've never understood the convention of removing asserts in production builds. It seems like removing the seatbelts from the car before the race just to save a few pounds.
> Side note: I've never understood the convention of removing asserts in production builds. It seems like removing the seatbelts from the car before the race just to save a few pounds.
Once an upon a time computers were slow and every cycle mattered. Assertions were compiled out of the build by necessity. Better a crash once in a while than the program hardly running because it was so slow.
Case in point: When I was stuck inside a "big ball of perl" codebase that heavily used assertions for method input validation, I generated a flame graph of where time was spent in the codebase and it turned out it was assertions all the way down. Since only a small percentage of inputs came from external/unvalidated sources (user input etc) it was fine to remove the vast majority of them outside of the development environment. So we turned them into no-ops in prod and had a significant performance improvement.
Like everything in life, it depends.
If this is some inconsequential part of the codebase it might be better to limp on then to completely stop anyone, user or fellow dev, from running the app at all.
Said another way, graceful degradation is a thing.
I think this is precisely why exceptions model particularly well - well - exceptional situations.
They let you install barriers, and you can safely fail up until that point, disallowing the program from entering cursed states, all the while a user can be returned a readable error message.
In fact, I would be interested in more research into transactions/transactional memory.
How do you gracefully degrade when your program is in a buggy state and you no longer know what data is valid, what is garbage and what conditions hold ?
If I told you to write a function that takes a chunk of customer JSON data but I told you that the data was produced / processed by some code that is buggy and it might have corrupted the data and your job is to write a function that works on that data how would you do it?
Now your answer is likely to be "just check sum it", but what if i told you that the functions that compute the check sums sometimes go off rails in buggy branches and produce incorrect checksums.
Then what?
In a sane world your software is always well defined state. This means buggy conditions cannot be let to execute. If you don't honor this you have no chance of correct program.
Contrary to people's dislike of OOP, I think it pretty well solves the problem.
You have objects, and calling a method on it may fail with an exception. If the method throws an exception, it itself is responsible for leaving behind a sane state, but due to encapsulation it is a feasible task.
(Of course global state may still end up in illegal states, but if the program architecture is carefully designed and implemented it can be largely mitigated)
Why not bring down the entire server if you detect an error condition in your application? You build things in a way where a job or request has isolated resources, and if you detect an error, you abort the job and free those resources, but continue processing other jobs. Operating systems do this through processes with different memory maps. Applications can do it through things like arenas or garbage collection.
It may be okay in a server, but (for example) not in a desktop application. The issue, then, is that most code lives (or should live) in library-like modules that are agnostic of which kind of application context they are running in. In other words, you can’t just abort in library code, because the library might be used in application contexts for which this is not acceptable. And arguably almost all important code should be a library.
Exception mechanisms let the calling context control how to proceed. Deferring to that control and doing some cleanup during stack unwinding virtually never causes serious issues in practice.
What I meant was that if you follow the logic of "computer is in an unknown state. Stop processing everything", then why not continue that to the entire server (operating system, hypervisor, etc.)? Obviously it's not okay in almost any context. Instead, assuming you have something more complicated than a CLI script that's going to immediately exit anyway, you should be handling those sorts of conditions and allowing your event loop/main thread to continue.
I think the issue is that bringing the application down might mean cutting short concurrent ongoing requests, especially requests that will result in data mutation of some sort.
Otherwise, some situations simply don't warrant a full shutdown, and it might be okay to run the application in degraded mode.
"I think the issue is that bringing the application down might mean cutting short concurrent ongoing requests, especially requests that will result in data mutation of some sort."
Yes but what is worse is silently corrupting the data or the state because of running in buggy state.
This is a false choice.
If you don't know why a thing that's supposed to never be null ended up being null, you don't know what the state of your app is.
If you don't know what the state of your app is, how do you prevent data corruption or logical errors in further execution?
> If you don't know what the state of your app is, how do you prevent data corruption or logical errors in further execution?
There are a lot of patterns for this. Its perfectly fine and often desirable to scope the blast radius of an error short of crashing everything.
OSes shouldn't crash because a process had an error. Servers shouldn't crash because a request had an error. Missing textures shouldn't crash your game. Cars shouldn't crash because the infotainment system had an error.
If you can actually isolate state well enough, and code every isolated component in a way that assumes that all state external to it is untrusted, sure.
How often do you see code written this way?
This is basically all code I've worked on. You have a parsing/validation layer that passes data to your logic layer. I could imagine it working less well for something like a game where your state lives longer than 2 ms and an external database is too slow, but for application servers that manipulate database entries or whatever it's completely normal.
In most real-world application programming languages (i.e. not C and C++), you don't really have the ability to access arbitrary memory, so if you know you never gave task B a reference to task A or its resources, then you know task B couldn't possibly interfere with task A. It's not dissimilar to two processes being unable to interfere with each other when they have different logical address spaces. If B does something odd, you just abort it and continue with A. In something like an application server, it is completely normal for requests to have minimal shared state internal to the application (e.g. a connection pool might be the only shared object, and has a relatively small boundary that doesn't allow its clients to directly manipulate its own internals).
You can "drop" that request which fails instead of crashing the whole app (and dropping all other requests too).
Sure. You wouldn't want a webserver to crash if someone sends a malformed request.
I'd have to think long and hard about each individual case of running in degraded mode though. Sometimes that's appropriate: an OS kernel should keep going if someone unplugs a keyboard. Other times it's not: it may be better for a database to fail than to return the wrong set of rows because of a storage error.
That's exactly what the attacker wants you to do after their exploit runs: ignore the warning signs.
You don't ignore it. You track errors. What you don't do is crash the server for all users, giving an attacker an easy way to DoS you.
A DoS might be the better option vs. say, data exfiltration.
Most bugs aren't going to create any risk for data exfiltration. In most real application servers (which are very rarely written in C or C++ these days), requests are almost completely isolated from each other except to the extent that they interact with a database. If you detect a bug in one request, you just abort the one request, and there's likely no way it could affect others.
This is part of why something like Rust is usable at all; in the real world a lot of logic has straightforward, linear lifecycles. To the extent that it doesn't, you can push the long-lived state into something like an external database, and now your application has straightforward lifecycles again where the goal of a task is to produce commands to manipulate the database and then exit.
Sure, but i was talking about an individual process. If you don't know what state it's in, you simply can't trust it to run anymore. That's all.
Except you usually can because the state isn't completely unknown. You might not expect some field in a structure to be null, but you still know for example that there's no way for one request to have a reference to another, so you just abort the one request and continue.
And what does DOS attacker want you to do? Not crashing the whole service to deny others of the service?
That is a valid tradeoff in many situations, yes.
> If you don't know what the state of your app is, how do you prevent data corruption or logical errors in further execution?
Even worse, you might be in an unknown state because someone is trying to exploit a vulnerability.
If you crash then you've handed them a denial of service vulnerability.
That's an issue handled higher up the stack with process isolation etc. It's still not ok to continue running a process that is in an unknown state.
This is often called offensive programming.
Hot damn, I never heard of this term before but yeah that's exactly what it is.
TIL, thanks.
Paradoxically, it is still a subset of defensive programming.
I largely agree. If it came to pass that the precondition fails, there's a bug somewhere and this code just hides it. At the very least, that should go to an error log that someone actually sees.
I'm writing a Rust project right now where I deliberately put almost no error handling in the core of the code apart from the bits accepting user input. In Rust speak, I use .unwrap() all over the place when fetching a mandatory row from the DB or loading config files or opening a network connection to listen on or writing to stdout. If any of those things fail, there's not a thing I can plausibly do to recover from it in this context. I suppose I could write code like
and make the program exit if it returns an error, but that's just adding noise around: The only advantage is that the error message is more gentle, at the expense of adding a bunch of code and potentially hiding the underlying error message from the user so that they could fix it.I think Python also gets that right, where it's common to raise exceptions when exceptional things happen, and only ever handle the exceptions you can actually do something about. In 99.999% of projects, what are you actually going to do at the application level to properly deal with an OOM or disk full error? Nothing. It's almost always better to just crash and let the OS / daemon manager / top level event loop log that something bad happened and schedule a retry.
The whole story is 3-fold. We have
In the first category we're dealing with BUGS and when I advocate asserting and terminating the process that only really applies to BUG conditions. If you let an application to continue in a buggy state then you cannot logically reason about it anymore.The logical conditions are the typical cases for example "file not found" or whatever. User tries to use the software but there's a problem. The application needs to deal with these but from the software correctness perspective there's no error. The error is only what the user perceives. When your browser prints "404" or "no internet connection" the software works correctly. The error is only from the user perspective.
Finally the last category are those unexpected situations where something that should not fails. It is quite tricky to get these right. Is straight up exiting the right choice? Maybe the system will have more sources later if you just back off and try again later. Personally in C++ projects my strategy is to employ exceptions and let the callstack unwind to the UI level, inform the user and and then just return to the event loop. Of course the real trick is to keep the program state such that it's in some well defined state and not in a BUGGY state ;-)
When a process is used to serve multiple requests, I don't think you need to let the whole process terminate just because there is a bug dealing with a single request. Just because we can not reason about the current request does not mean the only way to get to the clean state for other requests is to terminate the whole process.
That sounds about right to me. Worry about the things you can fix and don't worry abut the things outside your control.
Makes sense. Better to unwrap via .expect("msg"), though.
That's a good callout, but I do that if and when I can add extra meaningful context.
From a user's POV, "I already know what file not found means. You don't have to explain it to me again in your own words."
The thing I wish more error messages did was tell me exactly which file was not found.
Asserts are only available in debug compile mode.
"MIT v. Berkeley - Worse is Better" => https://blog.codinghorror.com/worse-is-better/
"Fail Fast / Let it Crash" => https://erlang.org/pipermail/erlang-questions/2003-March/007...
...you're in good company. :-)
I don't agree with any of this.
First of all, this results in unintelligible errors. Linux is famous for abysmal error reporting, where no matter what the problem really is, you get something like ENOENT, no context, no explanation. Errors need to propagate upwards and allow the handling code to reinterpret them in the context of the work it was doing. Otherwise, for the user they are either meaningless or dangerous.
Secondly, any particular function that encounters an unexpected condition doesn't have a "moral right" to terminate the entire program (who knows how many layers there are on top of what this particular function does?) Perhaps the fact that a function cannot handle a particular condition is entirely expected, and the level above this function is completely prepared to deal with the problem: insufficient permissions to access the file -- ask user to elevate permission; configuration file is missing in ~/.config? -- perhaps it's in /etc/? cannot navigate to URL? -- perhaps the user needs to connect to Wi-Fi network? And so on.
What I do see in practice, is that programmers are usually incapable of describing errors in a useful way, and are very reluctant to write code that automates error recovery, even if it's entirely within reach. I think, the reason for this is that the acceptance criteria for code usually emphasizes the "good path", and because usually multiple bad things can happen down the "bad path", it becomes cumbersome and tiresome to describe and respond to the bad things, and then it's seldom done.
yup. we have definitely all gotten an ENOENT or EIO before with no context.
There's a lot of comments here that seem overly critical. The author came up with solutions to extend Go's errors to meet their needs and shared that with the world- thank you.
I have been solving all the same problems and providing libraries that allow for more flexibility so that users can come up with approaches that best meet their needs. I am finally polishing the libraries and starting to write about them:
https://blog.gregweber.info/blog/go-errors-library/ (errors with stack traces and metadata)
https://github.com/gregwebs/errcode (adding codes to errors- working on improving docs and writing about this now).
The concepts aren't wrong (structured logs from structured errors), but I find this code to be very un-go-like and there are obvious signs of trying to write java in go (iFace, structs with one property "because everything needs to be contained in an object", and others).
Return "error" and not a custom type "mypkg.Error" - you run into more nil interface pointer problems and you are breaking an idiom.
Let me provide a counter example for helping create structured logs from structured errors that I wrote up that is much more idiomatic if not more narrowly focused:
https://github.com/sethgrid/kverr
As in the article, if you want to attach "username: foo", this package lets you return kverr.New(err, "username", foo, ...), and then extract a slice or map later for logging like logger.WithArgs(YoinkArgs(err)...).
Feels like OP is basically implementing exceptions and exception handling at the application level. If this is what you want, then why not just switch to one of the many other languages that has exceptions built in at the language level?
I think they use too many sentinel errors [0] I have been doing Java for two decades, and I thought you need to handle individual errors by type. Using Go, I've learned from the code I write, 90%+ of errors I don't need to handle individually, or I can't do anything except bubble an error up. There is the rare case (10%) when a file does not exist, and I try to read an alternative one and I don't bubble up an error.
For customer support I also found it much easier, instead of an error number, print a UUID that customers can give to support, and that UUID (Request ID) then can be found in the logs to find out what happened by developers.
[0]:https://dave.cheney.net/2016/04/27/dont-just-check-errors-ha...
> Using Go, I've learned from the code I write, 90%+ of errors I don't need to handle individually, or I can't do anything except bubble an error up
So... exceptions are better, because they would do the correct thing by default in the majority of cases?
Exceptions are easier for the programmer. The programmer has to write less and they clutter the code less. But exceptions require stack traces. An exception without a stack trace is useless. The problem with stack traces is: they are hard to read for non-programmers.
On the other side Go's errors are more work for the programmer and they clutter the code. But if you consequently wrap errors in Go, you do not need stack traces any more. And the advantage of wrapped errors with descriptive error messages is: they are much easier to read for non-programmers.
If you want to please the dev-team: use exceptions and stack traces. If you want to please the op-team: use wrapped errors with descriptive messages.
Messages and stack traces in the error are orthogonal to errors-as-values vs. exceptions for control flow. You could have `throw Exception("error fooing the bar", ctx)`. You could also `return error("error fooing the bar", ctx, stacktrace())`. Stack traces are also occasionally useful but not really necessary most of the time IME.
Go's error handling is annoying because it requires boilerplate to make structured errors and gives you string formatting as the default path for easy-to-create error values. And the whole using a product instead of a sum thing of course. And no good story for exception-like behavior across goroutines. And you still need to deal with panics anyway for things like nil pointers or invalid array offsets.
Go messages are harder for both devs and users to read. Grepping for an error message in a codebase is a special hell.
Besides, it's quite trivial to simply return the exception's getMessage in a popup for an okay-ish error message (but writing a stacktrace prettifier that writes out the caused by exception's message as well is trivial, and you can install exception handlers at an appropriate level, unlike the inexpensibility of error values)
I tend to use "catch and re-raise with context" in Python so that unexpected errors can be wrapped with a context message for debugging and for users, then passed to higher levels to generate a stack trace with context.
For situations where an unexpected error is retried, eg, accessing some network service, unexpected errors have a compressed stack trace string included with the context error message. The compressed stack trace has the program commit id, Python source file names (not pathnames) and line numbers strung together, and a context error message, like:
[#3271 a 25 b 75 c 14] Error accessing server xyz; http status 525
Then the user gets an idea of what went wrong, doesn't get overwhelmed with a lot of irrelevant (to them) debugging info, and if the error is reported, it's easy to tell what version of the program is running and exactly where and usually why the error occurred.
One of the big reasons I haven't switched from Python to Go for HashBackup (I'm the author) is that while I'd love to have a code speed-up, I can't stomach the work involved to add 'if err return err("blah")' after most lines of existing code. It would hugely (IMO) bloat the existing codebase.
Exceptions solve the 10% better with the tradeoff of their inflexibility for the other 90% of cases.
When there's an exceptional case, it's better to handle that explicitly. I think Rust does that best with its single-character ? operator, but I don't want exceptions invisibly breaking out of control flow unless I give them permission to. `if err != nil` is a fair enough way of doing that.
It's not a good way to do this because it doesn't force you to either handle or propagate.
Yeah there are linters that force you not to implicitly discard errors, but that should really be a compiler error. Still, that's not a problem inherent to the Go's error-handling model.
Better is subjective, but I prefer errors as return values because then the function signature states whether an error has to be handled or not. Exceptions can be forgotten about, but returned errors have to be explicitly ignored.
That's an independent problem - checked exceptions (and the even better effect types) are part of the method signature.
Checked exceptions feels six of one half dozen of the other to me.
Unless you forget to catch the right type of exception. Then all hell breaks loose.
People bitch about checked exceptions in Java but this is precisely why I think they're a great idea. You can't forget to catch the right type of exception.
The biggest issue with checked exceptions in modern Java is that even the Java makers themselves have abandoned them. They don't work well with any of the fancy features, like Streams.
Checked Exceptions are nothing but errors as return values plus some syntactic sugar to support the most common response to errors, bubbling.
Scala's zio library basically gives you checked exceptions that work with things like type inference, streams, async operations, and everything else.
> They don't work well with any of the fancy features, like Streams.
Because that would require effect types, which is quite advanced/at a research level currently.
All it would require is more support for sum types and variadic type parameters, and maybe fix some hiccups in the existing type inference. You can already write a Stream-like API that supports up to a fixed number of exception types (it’s just a bit annoying to write). The main issue at present is that you can’t do it for an open-ended number of exception types and abstract over the concrete set of types.
The throws clause would require union types, not sum types though (you can observe it in the catch part of a try catch, e.g. `catch ExceptionA | ExceptionB`. But java can't support unions elsewhere, it will have to be replaced by the two exceptions' common supertype.
That would be true if not for Java making the critical mistake of excluding RuntimeException from method definitions, so in-practice people just extend RuntimeException to keep their methods looking "clean".
Or are forced to because they want to use generics or lambdas.
Both work with checked exceptions.
The problem is that there's no way to specify an exception specification like "I propagate everything that this lambda throws" (or, for generics, "that method M of class C throws").
No, but you can have an interface like
or the same with more than one exception type, and convert your lambda to that. This works. The only problem is that you can’t abstract over an arbitrary number of exception types.In principle, one could imagine a syntax for variadic type parameters like
that would solve that problem.Will the compiler infer that a lambda or a method ref implements Func with its exception type param, or do you have to rewrite call sites?
Additional info, they predate Java, having made an appearance in CLU, Modula-3 and C++, before Java was invented.
I miss them in other languages every time I need to track down an unhandled exception in a production server.
>> People bitch about checked exceptions
> they predate Java, having made an appearance in CLU, Modula-3 and C++
Checked exceptions in C++? Can you force/require the call chain to catch an exception in C++? At compile time?
That was part of the idea behind them yes, as many things in WG21 design process, reality worked out differently, and they are no longer part of ISO C++ since C++17.
Although some want to reuse the syntax for value type exceptions, if that proposal ever moves forward, which seems unlikely.
The problem is that a checked exception makes sense only at a relatively high level of the app, but they are used extensively at a low level
No, but you can easily end up missing some because somebody wrapped them in some sub-type of RuntimeException because they were forced(!) to. This happens all the time because the variance on throws clauses it at odds with the variance of method signatures (well, implementations, really -- see below).
A new implementation of a ThingDoer usually needs to do something more/different from a StandardThingDoer... and so may need to throw more types of exceptions. So you end up having to wrap exceptions ... but now they don't get caught by, say, catch(IOException exc). If you're lucky you own the ThingDoer interface, but now you have a different problem: It's only JDBCThingDoer which can throw SQLException, so why does code which only uses a StandardThingDoer (via the ThingDoer interface) need to concern itself with SQLException?
Checked exceptions in Java are worse than useless -- they actively make things worse than if there were only unchecked exceptions. (Because they sometimes force the unavoidable wrapping -- which every place where exceptions are caught needs to deal with somehow... which no help from the standard "catch" syntax.)
One thing you can do in Java is parameterise your interface on the exception type. That way, if the implementation finds it needs to handle some random exception, you can expose that through the interface -- e.g. "class JDBCThingDoer implements ThingDoer<SQLException>". Helper classes and functions can work with the generic type, e.g. "<E> ThingDoer<E> thingDoerLoggingWrapper(ThingDoer<E> impl)".
I think this works really well to keep a codebase with checked exceptions tractable. I've always been surprised that I never saw it used very often. Anyone have any experience using that style?
I guess it's not very relevant any more because checked exceptions are sadly out of fashion everywhere. I haven't done any serious Java for a while so I'm not on top of current trends there.
How do you handle the situation where the code might need to throw (pre-existing) exceptions that don't share a useful base class?
I don’t remember! Possibly that’s one of the cases where it doesn’t work out.
Of course, if you had proper sum types, that situation wouldn’t be a problem.
Java has proper sum types, but what one needs here is union types. They are not the same, sum types are labeled and are disjoint.
You want `MyException | ThirdPartyException` here, though.
You're right, I had it backwards! Thanks for the correction.
Now I'm wondering, could that actually be all that's needed to rescue checked exceptions? Is there any language that has that combination of features?
Back when Java didn't have lambdas, one of the more advanced lambda proposals (http://www.javac.info/closures-v06a.html) had this exact thing for this exact reason.
Unfortunately, this take on lambdas was deemed too complicated, and so we got the present system which doesn't really try to deal with this use case at all.
Well, scala has union types, but it doesn't do checked exceptions per say (but it does have a very advanced type system so similar structure can be easily encoded). I think checked exceptions is pretty rare, so I don't know.. probably some research language (but they often go the extra mile towards effect types)
In a former life I worked with a codebase that used that style. Let's just say it isn't enough.
Can you remember what sort of problems you were hitting?
My main gripe with checked exceptions is they create a whole other possible code path on each `catch` clause. I tend to keep checked exceptions to the absolute minimum where they actually make sense, all the rest are RuntimeExceptions that should bubble up the stack.
That's kind of how you do it in go. Either:
1. Bubble up error (as is/wrapped/different error. 2. Handle error & have a (possibly complex) new code path.
There's also the panic/recover that sometimes is misused to emulate exceptions.
But so would every single other method to react to different types of errors, no?
In something like go, you're even required to create the separate code path for EVERY SINGLE erroring line, even if your intention is simply to bubble it up.
They don’t create any other code paths than RuntimeExceptions.
Is it time to brag about Rust error-handling or should we wait a little?
After a decade of Scala and Rust I no longer believe in monads and prefer the way Go does error handling.
looks nice but only by hiding error handling. Today I like looking at code and see the error handling. With monads you end up with monad stacks and transformers which introduce their own failure states.It's certainly better than Go's (Go's is barely better than C's and that's quite a low bar), but I don't think that sum types are the global optimum.
Exceptions are arguably better from certain aspects, e.g. defaulting to bubbling up, covering as small or wide range as needed (via try-catch blocks), and auto-unwrapping without plus syntax. So when languages with proper effect types come into mainstream we might reach a higher optimum.
Maybe I'm too pessimistic, but Rust style error handling feels like the global optimum under the constraint that the average developer understand it.
Go is a language that exists purely because people saw Monads in the horizon and, in their panic, went back to monke, programming wise. Rust error handling is something that even many Go fans have said is a good abstraction.
No, sum types are certainly not a global optimum. But they remain the best error-handling mechanism that I've used professionally so far.
Effect types (and effect handlers) are very nice, but they come with their own complexities. We'll see if some mainstream language manages to make them popular.
Which Go doesn't fix either, because their errors are all just "error", aka you can also forget to catch the right type of error.
If only there was a way to combine optimizing the default path (bubbling), and still provide information on what errors exactly could happen. Something like a "?" operator and a Result monad...
You may be thinking a bit too much about what happens in _Go_ when you forget to check for an error response from a function -- the current function continues on with (probably) incorrect/nil values being fed to subsequent code. In Java when an uncaught exception is thrown, the exception makes its way back up the call stack until it's finally caught, meaning subsequent code is _not_ executed. It's actually a very orderly termination. In any Java web framework (Spring et al) there's always a centralized point at which exceptions are caught and either built-in or user-specified code is used to translate the error to an HTTP response.
This makes for much more pleasant code that is mostly only concerned with the happy path, e.g., my REST endpoint doesn't have to care if an exception is thrown from the DAO layer as the REST endpoint will simply terminate right then and there and the framework will map the exception to a 500 error. Why anyone would prefer Go's `if err != nil {}` error handling that must be added All. Over. The. Place. at every single level of the application is beyond me.
My slightly snarky take is that liking Go is simply a defensive reaction to one too many AbstractFactoryBeanFactory. Too many abstractions overloaded their "abstraction-insulin", so now they can only handle minute amounts of abstraction.
I liked your other comment's take with the monads better :P
This was exactly my train of thought. I even went looking for Dave's blog post about it before I saw your comment. :D
No, TFA is mostly about making errors consistent in a large application, while exception (vs error as standard return value) is largely about easier bubbling, which is one thing TFA hardly talked about (maybe I missed it, I only skimmed the article). In fact it spends a lot of energy on wrapping which is the opposite of automatic bubbling provided by exceptions by default. Throwing random, inconsistent module/package/whatever-specific exceptions from everywhere causes most of the same problems described in TFA.
I feel like all the canned comments saying TFA is about implementing exceptions / ADT result type are from people who didn’t read the article and just want to repeat all the cliche on the topic (for easy karma? No idea what’s the point).
That's not how I read it. It's more about having a consistent approach to managing error types in large code bases. This is a common problem with exception-based languages too.
How so? This is about how errors are defined, not how they're propagated through the application. Feels like you didn't actually read what was being done by the OP.
Exceptions have a hierarchical nature to them in most languages, or at least have some sort of identity to them. Your correct that the author doesn't try to change the way errors are propagated, but you can see similarities between what the author is creating themselves, and what already exists in languages with exceptions.
Go’s error handling is still cumbersome and lacking. I love writing Go but I don’t want to ever adopt anything like. It’s bending over backwards to achieve something sum types provide and this pattern is a mess.
I thought so too, after years with Scala and Rust. Now I think (X, error) is fine, indeed I think it is great for it's simplicity. I might want to have a safe assignment
But the urge is not very high.The problem is indeed composition. How do I chain 3 calls that short-circuit on the first error? In Go that's verbose in the extreme. With exceptions it's easy to miss an error. Sum type errors have neither problem.
If you want to chain 3 calls and short circuit on the first error, don't use Go. I like explicit code without magic that I can't see.
I don't know what is happening there. Is it summing up errors? Is if short circuiting? Does it have error handling? Is it async? Is it a monad stack with transformers? That code could mean anything. Good luck coming back to that code after six months. I think the Sum type solution focuses on the happy path, the Go solution assumes you need to focus on the things that go wrong.Additionally you have that nasty dependency of the Result type of a() to b() to make it work. And I've spent hours creating the right Transformer stack to compose more than two monad types like Result, Future and IO.
I would be all over Go with a better type system or exceptions.
If Go ever adds exceptions[1] as an error handling mechanism, I'm out. Value errors are far superior to exceptions, even in their current state in Go.
[1]: assuming panics are not an error handling mechanism but a recovery mechanism
Checked Exceptions are nothing but errors as values with some syntactic sugar for the most common use case (bubbling up the error).
Gos version of value errors is just micrometers ahead of C style error codes. In both cases you get told "there could be an error", the error is a value of one single type (error/int), and you have to manually find out which different errors this value could represent.
If you want to know what you're missing, check out Rusts error handling.
I love Rust error handling and wish it would have been the default in Go (too late to shoehorn it), but it has nothing to do with exceptions.
Panics can be values and errors don't have to be values in go?
I think you are missing up concepts here
I’m not talking about the underlying model, I’m talking about the control flow. What I mean is errors are explicit values belonging to the signature of the functions.
Madness.
C#, OCaml, Java, Scala, Kotlin all fulfill these requirements, while targeting the same niche.
Go has insanely good tooling and very fast single binary compiling.
While all these languages (afaik) can reach similar levels of functionality (GraalVM e.g.), it's more work. As much as I hate the language Go, I can't deny how braindead simple it is to just make a tool with it. I don't need to choose a build tool, or a runtime version, there's a library for everything and most developers with more than a room temp IQ can immediately start working on it.
The only other language that currently comes close is Rust. If only they had stuck to using a GC, I'd be in heaven.
Is this that different from languages like Dotnet? I think one line in your config can make it Aot compiled.
Yes there are indeed lots of languages in existence.
Trying to shoehorn code errors into HTTP errors is a prime example of conflating two very different things because sometimes they look similar. Let different things be different, I like to say. You either let your HTTP handlers do their own error-to-http-code management or you end up with a massive switch statement trying to map them all, or whatever monstrosity this approach is.
Also the entire problem of the OP would go away if they just implemented opentelemetry tracing to their logs.
The sane thing to do is to let lower layer functions return
then, in the HTTP layer, your endpoint code just looks like the benefits of this are hard to overstate.* Errors are clearly enumerated in a single place. * Errors are clearly separated from but connected to HTTP, in the appropriate layer. (=the HTTP layer) Developers can tell from a glance at the resource method what the endpoint will return for each outcome, in context. * Errors are guaranteed to be exhaustively mapped because Kotlin enforces that sealed classes are exhaustively mapped at compile time. So a 500 resulting from forgetting to catch a ReusedPasswordException is impossible, and if new errors are added without being mapped to HTTP responses, the compiler will let us know.
It beats exceptions, it beats Go's shitty error handling.
ah, yes, completely separate.
HTTP code: 200 ok
Body: {"error":"internal server error"}
My favorite example of this was renaming a 500 error due to an unhandled exception to a 400 error to make it look like it was the error of the caller. Management was also possibly tracking 500 errors too, so the 400 could also have been gaming the system.
In some mental models, though, it did make sense. Particularly the one that went, "Well, we never would have errored, if you never called us!"
It's somewhat fair though. If there's a case that would cause errors for the system and it's a case that you're not supposed to handle, then a 400 error sounds perfect for that case. For example, if you have a service and it panics/returns 500 when you pass in an empty user id, then you could instead return a 400 before you hit the panic and all is good.
Normally you should attempt to find all the corner cases and present the errors to the user -- before processing the request. If you can't do this, it's time to rethink how your api works. A good api is simple to use and simple to write.
It also simplifies your business logic in that all the possible user defined idiocies are caught before your business logic actually processes the request.
Some frameworks do this better than others. And rather than documentation, I tend to prefer comprehensive error messages.
> Normally you should attempt to find all the corner cases and present the errors to the user -- before processing the request.
That is what they are suggesting. You check the request and return 400 if it’s bad.
One example of a 500 error is a null pointer error. Was it a bad request or a logic error? One is your problem the other is not. Just returning a 400 hides that issue. Validating the payload before processing it simplifies the issue for everyone involved.
A 500 error should be your problem with a stack trace in the log. A 400 error should provide enough description to tell the user it's theirs and how to fix it.
Just marking recoding a 500 to a 400 because of a null pointer error would get noticed on a code review and marked up on a code review.
400 - you fucked up
500 - we fucked up
Think about what the client code looks like to handle this and the alternative, particularly if you’re implementing an sdk and the api is an implementation detail. I’m not saying I would choose this path, but it certainly reduces the amount of code on both sides that you have to write.
If HTTP is your API's transport layer, then HTTP errors should be related to problems with the transport layer and not to API itself. Is the internal server error caused by a bad HTTP request or a bad API request?
Honestly, my controversial take is that for APIs, it would be cleaner to not use any HTTP status codes other than 200 and have all of the semantics in the body of the response. I'm sure someone smarter than me will jump in and explain why this wouldn't work in practice, but it just feels like application semantics are leaking from a much more natural location in the body of the response. I feel similarly about HTTP request methods other than POST in APIs; between the endpoint route and the body, there should be more than enough room to express the difference between POST, PATCH, and DELETE without needing them to be encoded as separate HTTP methods.
I'm sympathetic, but this can have issues if you want your API to be used by anything other than your own client, including stuff like logging middleware. A lot of tools inherently support/understand HTTP status codes, so building on top of that can make integration a lot easier.
We, very roughly, do it like this:
- 200: all good
- 401: we don't know who you are
- 403: you're not allowed to do that
- 400: something's wrong and you can fix it
- 500: something's wrong and you can't fix it
Each response (other than 401) includes a json blob with details that our UI can do something with, but any other consumer of the API or HTTP traffic still knows roughly what's going on.
I've worked in places where we really sweated on getting the perfect HTTP status codes, and I'm not sure it added much benefit.
On POST - I find myself doing logical GETs with POST a lot, because the endpoint requires more information than can be conveyed in URL params. It makes me feel dirty, and it's obviously not RESTful but you know - sometimes you just have to get things done.
You've just described basically everything a dev needs to know to implement HTTP APIs that report status codes properly, yet some people still seem to think it's oh so complicated. What has gone wrong?
I can understand how people might look at all the full list status codes and think it's all too hard, but yes, once you realize that there are only a handful you need most of the time it all becomes pretty simple.
Sure, but the problem in my opinion is that while the handful that you pick is totally reasonable, someone else might pick a slightly different handful that's just as reasonable. If I want to use a new API and delete a user, how do I know if it uses DELETE or POST, and if it will return 401 or 403? At best, you'll be able to skim through the documentation more quickly due to having encountered similar conventions before, but nothing stops that from happening in terms of request and response bodies either.
The fact that existing tooling relies on some of these conventions is probably a good enough reason to do things this way, but it's not obvious to me that this is because it's actually better rather than inertia. Conventions could be developed around the body of requests as well, and at least to me, it doesn't seem obvious that the amount of information conveyed at the HTTP method/response status layer was necessary to try to separate from the semantics of the request and response bodies. I'm sure that a part of that was due to HTTP supporting different content types for payloads, but nowadays it seems like quite a lot of the common alternatives to JSON APIs were designed not to even use HTTP (GraphQL, gRPC, etc.), which I'd argue is evidence that HTTP isn't necessarily being used as well for APIs as some people would like.
To make something explicit that I've been alluding to, everything I've said is about using APIs in HTTP, not HTTP in the context of viewing webpages in a browser. It really seems like a lot of the complications in HTTP are due to it trying to be sufficient for both browsers and APIs, and in my opinion this comes mostly at the expense of the latter.
It's quite unclear what's your point. HTTP APIs should have minimal status code set. Parent described it perfectly. It's simple, practical (especially from monitoring perspective) and doesn't intervenes with a service domain.
It seems you have some alternative in mind but it wasn't presented.
I don't consider what the parent comment listed as "minimal". The alternative I described is literally in my initial comment; using only 200 for APIs is "minimal".
Only 200 is detrimental for monitoring. You have to parse response body to classify response types. HTTP status codes is a cheap and already existing way to get insights into service behavior.
It's minimal if you want to integrate with anything that understands HTTP status codes.
Need an AI playground to paste error responses and fix the code.
> Each response (other than 401) includes a json blob with details
...until you discover an intermediate later strips the body of certain error codes. Like apache, which IIRC includes all 5xx and most 4xx.
Go ahead try to implement something like cross-origin requests or multipart encoded form uploads just using the body semantics you described. I’ll wait.
Also that is not a controversial take. It is at best a naive or inexperienced take.
Both of those happen in the context of web browsing rather than existing in APIs in a vaccuum; I'd argue that there's absolutely no reason why the mechanism used to request a webpage from a browser needed to be identical to the mechanism used for the webpage to perform those actions dynamically, which is pretty much my whole point: it doesn't seem obvious to me that it's useful to encode all of that information in an API that isn't also being used to serve webpages. If you are serving webpages, then it makes sense to use semantics that help with that, but I can't imagine I'm the only one who's had to deal with bikeshedding around this sort of thing in APIs that literally only are used for backends.
Multipart messages definitely happens in APIs as well, if you are handling blobs that are potentially pretty big.
There are a lot of useful network monitoring tools that can analyze HTTP response codes out of the box. They can't do this for your custom application error format. You don't have to go crazy with it, but supporting at least 200/400/500 makes it so much easier to monitor the health of your services.
I like to find a middle ground.
I use http status codes to encode how the _request_ was handled, not necessarily the data within the request.
A 400 if you send mangled JSON, but a 200 if the request was valid but does not pass business validation rules.
Inside the 200 response is structured JSON that also has a status that is relevant at the application level.
Otherwise how can for example you tell if a 404 response is because the endpoint doesn't exist, or because the item requested at the endpoint doesn't exist?
I believe it's important to have a separation between what is happening at the API level vs Application, and this approach caters for both.
> A 400 if you send mangled JSON, but a 200 if the request was valid but does not pass business validation rules.
What about empty required field in JSON? Is it still mangled or it's already BL?
As it's not to do with the http request and the body was able to be parsed, in my book that'd be classified as being at the application level, so results in a 200 status with a JSON response detailing the issue
200 OK {status: "failed", errors: ["field X is required"]}
How you deal with this on the application side, what JSON statuses you have etc is up to you.
It's an client error and it's highly beneficial to make it 400 for monitoring purposes. You want to see your FE or mobile devs deployed a faulty app.
That depends on how you set up and do your monitoring. Not every failure needs to be indicated by an HTTP status code.
For example, on a server I'm working on there are helper functions that generate different types of responses. Responding in certain ways will produce a 200, but will also log a warning or error.
On the client side, you can create request helpers that all requests go through and that can resolve requests appropriately, rendering error messages to the user etc.
The main thing is to have a well defined, consistent approach.
One reason for using HTTP verbs is to distinguish between queries and updates, and for the latter, between idempotent and non-idempotent updates. This in turn makes it possible to do things like automatically retry queries on network errors or cache responses where it is safe to do so.
Anecdotally the color codes make life much easier when debugging a new API. You instantly see that's something is wrong. If everything is green you don't realize that something is wrong until you carefully read a uniquely structured custom response. Saves a lot of effort.
> Honestly, my controversial take is that for APIs, it would be cleaner to not use any HTTP status codes other than 200 and have all of the semantics in the body of the response.
We've been doing that for 20 years with json-rpc 1.0
In this context, HTTP is just the transport and HTTP errors are only transport errors.Yes, you throw away lots of HTTP goodies with that, but there are many situations where it makes more sense than some half-assed ReSTish API. YMMV.
Yeah, that's usually the pragmatic thing to do. Facebook does that with their API, for example.
4xx or 5xx gets you the default HTTP handling for that kind of error. Occasionally - especially in small examples - that default handling does what you want and saves you duplicating a lot of work. More often it gets in your way.
I'd compare it to browser default styling - in small examples it sounds useful, but in a decent-sized site you just end up having to do a "CSS reset" to get it out of the way before you do your styling.
Your kind of describing things like thrift and other rpc servers?
Possibly. I'm not sure why it should require switching to an entirely different protocol though; my point is that making an API that only uses POST and always returns 200 is something that already works in HTTP though, and I have trouble understanding why that isn't enough for pretty much everything.
You lose some benefits of features already implemented by existing HTTP clients (caching, redirection, authorization and authentication, cross-origin protections, understanding the nature of the error to know that this request has failed and you need to try another one...).
It's is certainly not comprehensive, but it's right there and it works.
Moving to your own solution means that you have to reimplement all of this in every client.
> understanding the nature of the error to know that this request has failed and you need to try another one...
Please elaborate. In my experience, most of HTTP client libraries do not automatically retry any requests, and thank goodness for that since they don't, and can't, know whether such retries are safe or even needed.
> redirection
An example of service where, at the higher business logic level, it makes sense to force the underlying HTTP transport level to emit a 301/302 response, would be appreciated. In my experience, this stuff is usually handled in the load-balancing proxy before the actual service, so it's similar to QoS and network management stuff: the application does not care about it, it just uses TCP.
They don't retry on errors but they know it is an error. Eg. imagine a shell script using curl or wget and trying multiple URLs as a health check (eg. on different round-robin IPs). Without these "generic" HTTP tools knowing that this is a "failure", you would need to implement custom parsing for any case like this instead of relying on the defined "error" and "success" behaviour.
The same holds true if you are using any programming library: there is a plethora of handlers for HTTP errors.
As for redirection, a common example is offering downloads through S3 using pre-signed URLs (you share a URL with your own domain, but after auth redirect to a pre-signed S3 URL for direct download or upload).
You are thinking like a developer, but there is a world of networking as well. Between your client and server will be various bits of hardware that cannot speak the language you invent. 200, 401, 500 — these are not for the use of the application developer — but rather the infrastructure engineer.
Something being "enough" doesn't mean it's optimal. There's a huge stack of tools that speak HTTP semantics out of the box; including the user agent, i.e. the browser (and others), but also stuff like monitoring tools, proxies, CORS, automation tools, web scrapers...
You don't need to reinvent HTTP semantics when HTTP is already there, standard, doing the right thing, compatible with millions of programs all across the stack, out of the box.
HTTP is so well designed it almost makes me angry when people try to sidestep it and inevitably end up causing pain in the future due to some subtle semantic detail that HTTP does right and they didn't even think to reimplement.
And the only solution to such issues (as they arise, and they will) is to slowly reimplement HTTP across the whole stack: oh, you need to monitor your internal server errors? Now you have to configure your monitoring tool (or create your own) to inspect all your response bodies (no matter how huge) and parse their JSON (no matter how irrelevant) instead of just monitoring the status code in the response header and easily ignore the expensive body parsing.
Even worse when people go all the way. If we don't need status codes, why do we need URLs at all? Just POST everything to /api/rpc with an `operation` payload. Congrats, none of your monitoring tools can easily calculate request rates by operation without some application-specific configuration (I wish this was a made up scenario).
Just use HTTP ffs. You'd need a very good reason not to use it.
You need some kind of structured way to describe the action to take, what the result is or what the error is. so the client and server can actually parse the data. that's the protocol, whether its something formal like rpc libraries, or "REST"-ish or w/e
json-rpc is probably what your describing over http, maybe if you squint enough graphql too
This is the way to go, pretty much solves, 404 resource not found or route not found. But you will get laughed at by so called architectural dogmatists. Remember we aren't really doing REST, it's just RPC and let's call it that.
Shoehorning http protocols error codes as application error codes, drinking the cool aid and calling it best practice is beyond bizzare.
Agree. "200 - successfully failed to do the thing" is valid and useful.
500 is "failed to do anything at all"
The error to code in the http handler is the true path. It’s the only place where the context and knowledge is about semantics. In one endpoint if something is not found it can be a proper 404, if its existence is truly optional. In another endpoint the absence might very well qualify as a 500.
100% this.
Deciding on end user error handling in a low level is making assumptions that cannot be known at the low level. The caller decides how something is going to be handled and presented, not the callee, or you inevitably miss them in important places and silently miscategorize stuff. Far better to have that scenario lead to a 500 (unmapped error, unknown problem) so it can be fixed.
404 is quite an ominous thing. 404 because route is not found or entity not found. God bless your monitoring.
422 is frequently used for this case despite being part of the WebDAV extensions.
Ah, a God error package that has all seeing knowledge of the domain around it. What a monstrosity.
It's not the worst idea for an organisation to centralise stuff that needs to be centralised.
Like defining protobuf schemas, it's no good if each team does its own thing.
Either the package does or the developers do. And only one of them has compilation checks.
> centralized system [... for errors]
dont think this will scale. errors are part of API. (especially Go mantra errors are values https://go.dev/blog/errors-are-values it is ever more prominent). and each API is responsibility of a service
so unless you deal with infrastructure or standards/protocols layer (say you define what HTTP 500 means or common pattern for URL paths in your API), then better not couple all services. those standards are very minimal and primitive that works for everything, which is opposite what you doing here aggregating all the specifics into single place
I agree Go error handling is unoptimal, but this is simply not the right approach. This essentially turns error handling into a whole other language, almost like how Ginkgo is a separate language for handling tests.
And most languages are lacking this useful error language. You can’t speak if you have no language, so having it must be a good thing.
The only questionable thing here is that this framework is not a part of the main language still, which means near zero adoption. But that train has sailed.
I'm still just growing organically. Constant errors are useful for sentinel values https://dave.cheney.net/2016/04/07/constant-errors . RFC9457 https://datatracker.ietf.org/doc/html/rfc9457 is useful for REST (JSON) APIs. pkg/errors is nice for stack traces. I define error structs with field to pass structured data instead of trying to parse strings.
I think that's overkill, most of the time I just bubble errors up and I have very few cases where the error handling depends on the type of error. I guess it's because I don't use errors for things that are recoverable and try to fix them instead inside the given function. An example given here in the thread is reading from a file and if it doesn't work try a backup. Rather than having a function that reads from a file and returns a bunch of different errors I'd just make one with a list argument and then handle the I/O errors inside, and return an "unrecoverable" error otherwise.
For adding context, %w is good enough I find, though as I said I only very sparingly use errors.Is(...). Go isn't a language that's designed around rich error or exception types, and I don't think you should use it like that.
Well, yes, if you're just using errors as error messages, you only need strings and %w. That's usually good enough if you're writing an application.
However, if you're writing a library, chances are that your users want to catch the errors, find out whether the call failed because, say, the remote API is down or because the password is wrong.
Or if you're writing an API, you probably want to return different error codes. If your errors are bubbling, you'll need to somehow `errors.Is`/`errors.As` somewhere.
Yea, but like, when making an HTTP request, a timeout is significantly different from a failure to open a socket from a failure to resolve the hostname from a 429 error. And often it is up to the caller to decide how to handle those situations.
Is this just someone's proposal, or a formal addition to Go, or what?
"All errors must implement the Error interface." That's a step forward.
Rust really has the same error handling as Go - return an error status. But the syntax is cleaner. Rust thrashed around with errors at first. Then things sort of settled down. At this point, everybody uses Result<UsefulValue, Error>, but "Error" is just a trait that doesn't require much information. And "?" for propagating errors upwards is a huge convenience.
It's probably too late to retrofit "Result" and "?" into Go libraries, although they'd fit the language.
> Rust really has the same error handling as Go
Not at all. Rust has proper sum types, that it can return just like anything else in the language, while Go has a special cased error return slot (one may be tempted to call it an ugly hack), and it can return a value on both, which it does in some standard library calls.
Not at all. Go has an error type, and Go functions have the ability to return zero, one, two, or more items, ordered however the developer likes. An error may be among those, as desired, and populated as desired.
Some software also writes to both STDOUT and STDERR.
I know, special cased may have been better worded as "just a convention". My point is, this is not much different than using a thread-local variable, like errno, and adds useless confusion - your return values represent n*m values, while there is only n+m case with proper error semantics.
Re STDERR: but shells don't decide whether a program execution failed on having written to STDERR, but by the returned singular error code.
I agree with everything you've written in this comment.
I'd like to split a hair here and say, this is a "Go's standard library" problem, and not a "Go language" problem.
Good API design for a software package should have proper error semantics.
Good API design for a language, allows for flexibility in actual implementation, alongside standards that say "you SHOULD do this".
Disagree. This level of convention is inseparable from the language.
Not doing the conventional error return in go would be akin to using a Return sum type in reverse, putting the success value into the Error case..
One of the issues in Go is that if all you ever do if "if err != nil { return err }", you will quickly run in to problems because you will have errors like "open foo: no such file or directory" or "sql: no rows in result set" without a clue where that error came from. Sometimes that's obvious, often it's not.
I'm not sure how Rust handles that? But it's more than just "propagate errors", but more like "propagate errors with the appropriate context for this specific error".
Rust uses the `?` operator to convert between error types which allows for users and libraries to hook in to the error before its returned.
There are a number of helper libraries that provide an extended type erased error type to attach a real stack trace to the error, such as `anyhow`. These helper libraries also provide ways to attach extra metadata to the error so you can do things like `returns_a_result().context("couldn't do it")?` so you can quickly annotate the error. The standard library is support for this through a `context.Value` like api on the Error trait. The std lib `Error` trait also has functions for find the cause of the error and traverse a collected chain of errors, very similar to go's `errors.Cause` api.
Rust also has a number of libraries for making specific error types like `thiserror` which can help generate error enums with the implementations required to carry backtraces, context and causes.
Yep, if you want wrapped errors in Rust, you use the anyhow crate. It leans heavily into dyn so has some performance tradeoffs, but it's roughly the same performance-wise as Go's error interface (which also uses a vtable under the hood).
Though using a dynamic error in Rust should only impose an allocation cost on the error path, and I presume Go is the same.
I have been seeing this pattern repeated over and over since I started using Go in 2014 where people think they should be “building my favorite missing feature” — whether that’s futures, generics, structural processes, OTP, version managers, package managers, or now apparently exceptions. I always get the sense that the authors think they’ve done something cool and helpful when in the first place if they had simply put more effort into comprehending the simple “Go way” it wouldn’t have been necessary at all, and the needed functionality would have fallen out of the design.
You realize that have of the features you are counting are now in Go while missing in the beginning exactly because people were missing them and Go simply did not offer a sane way to work around the missing features?
I'm also quite sure that Go will provide a more sane way to handle errors in the not so far future, since it's continuously at the top of people's complaints
your comment exemplifies the mentality, yes, and unfortunately it has now been adopted by project leadership, so I’m sure you are quite right that more “missing features” will get baked into the language soon :)
It's far better to have those features well-designed and baked into the language once, then to have them constantly poorly redesigned and baked into every other Go app.
nobody would ever use this argument for the design of C. It’s good for C to stay lean and simple while communities using C (please let’s not with this imaginary monolithic “The Community”) are free to try things and offer competing solutions that others are free to ignore.
kitchen sink languages are bad. Justifying them with “well the community is bad, so we need the bad thing to be mainlined” is maybe worse
C is legacy tech on life support.
By Go standard, all other languages are "kitchen sink". Conversely, I would argue that basics like decent error handling are not in any meaningful sense a "kitchen sink" thing.
C is still #4 on TIOBE, right behind Java, so that is not at all true.
Go is good because it’s not like the other languages.
It should stay not being like them, not try to be more like them.
I arrived to a similar conclusion. I come from Java and in Java you have exceptions with TryCatch clauses and declaring them in function signatures. It works fairly well but very difficult and not idiomatic to Golang.
Therefor, I created a simple rule. If you do not know what this error means to the user yet then let it stay a fmt.errorf("xx:%w",err). If you do, wrap it in your own custom ServerError struct and return that type from now on. Do not change the meaning of ServerError even if you wrap the Error with another ServerError.
It is telling that you come from Java with this opinion. OP's approach is certainly not idiomatic Go.
Idiomatic here means no idiom suggested really. So yeah, non-idiomatic.
When I thought about errors/exceptions, I basically came to the same conclusion. To reiterate or add to tfa: standard formulations, expected vs. happened, reasonable context visible in logs, error trees, automatic http/etc codes, tidy client messages in prod, reasonable distinction between: unexpected, semi-normal, programming error, likely fatal.
Not sure why most (all?) programming languages have such poor support for errors. Coding may feel like 2024, but error handling like 1980. Anyone with 2-5 years of any programming experience (in where errors do happen and they choose to handle them) will come to similar ideas.
Also the fact that try {} and catch/finally {} are always three different scopes is just idiotic. It should be try {catch{} finally{}}, what in the cargo cult that {}{}{} is? Everyone copies it blindly from grammar to grammar.
Posts like these remind me how go really has nothing going for it apart from goroutines and channels. It's awkward mix of low level and high level with C like influence, which is weird considering it's a GC language.
This approach is so bad, I don't even know where to start. But it's all symptoms of their, sorry, incompetence. Take the loadCredentials example on top. If os.ReadFile cannot find the file, it returns an error with string representation: "open cred.json: no such file or directory". This comes straight from the std lib as it is, a great error. What does the errors.Is(err, os.ErrNotExist) do: prepend "file not found" to it, rendering: "file not found: open cred.json: no such file or directory". So this adds exactly nothing. The next if will prepend "failed to read file" to it, again, adding nothing as well. The two errors checks should be replaced by one if statement, optionally wrapping it with a context string but I cannot think of any use. Then the next step, error handling of verifyCredentials. I can only guess what it does, but assume that it returns an "username 'foo bar' cannot contain spaces" error. Does prepending "invalid credentials" help anything? Nope, so the whole if can be removed as well. No surprise your errors get clunky if you make them clunky.
I have more pressing things to do than dissect this article line by line, but let me suffice that I feel sorry for newcomers to the language that an article like this is so high on HN. Back in the days there was just Dave Cheney's material to read [1], and it was excellent. It's unfortunately outdated in certain regards (e.g. with new Is/As functionality in the errors package for inspection and the %w formatting directive in fmt.Errorf) but it's still an excellent article.
[1]: https://dave.cheney.net/2016/04/27/dont-just-check-errors-ha...
>it returns an error with string representation: "open cred.json: no such file or directory". This comes straight from the std lib as it is, a great error.
It’s a terrible error. It’s not structured, so you can’t aggregate it effectively in logs, on top of that it leaks potential secret, so you can’t return it from RPC handler.
The string representation is obviously not structured, because it's a string representation and strings are scalars. The typed representation is structured, which you can put into your structured logs as you'd like, omitting sensitive information where needed.
The "typed" representation is an `error` type that a million other methods use with a single method that returns the string:
It's the same unstructured goslop.I'm worried readers of this article will be horrified and believe this kind of DIY error handling is necessary in Go.
The author has attempted to fix their unidiomatic error handling with an even more unidiomatic error framework.
New Go users: most of the time returning an error without checking its value or adding extra context is the right thing to do
> New Go users: most of the time returning an error without checking its value or adding extra context is the right thing to do
Thank you.
Feels like Go is having its Java moment: lots of people started using it, so questions of practice arise despite the language aiming at simplicity, leading to the proliferation of questionable advice by people who can't recognize it as such. The next phase of this is the belief that the std library is somehow inadequate even for tiny prototypes because people have it beaten over their heads that "everybody" uses SuperUltraLogger now, so it becomes orthodox to pull that dependency in without questioning it.
After a bunch of iterations of this cycle, you're now far away from simplicity the language was meant to create. And the users created this situation.
Go is having a Go moment: lots of people using it are realizing that other programming languages have all that complexity for a reason, and that "aiming at simplicity" by aggressively removing or ignoring well-established language features often results in more complicated code that's easier to get wrong and harder to reason about.
From my experience this is not the case. If you error out 7 functions deep and only return the original error there's no chance you're figuring out where it happened. Adding context on several levels is basically a simplified stack trace which lets you quickly find the source of the error.
I inherited a codebase with the same problem. After a few debugging sessions where it wasn't clear where the error was coming from, I decided the root problem was that we didn't have stack traces.
Fortunately, the code was already using zap and it had a method for doing exactly that:
zap.AddStacktrace(zap.LevelEnablerFunc(func(lvl zapcore.Level) bool { return lvl >= zapcore.InfoLevel }))
Because most of the time if there's an error, you'd likely want to log it out. Much of the code was doing this already, so it made sense to ensure we had good stack traces.
There's overhead to this, but in our codebase there was a dearth of logging so it didn't matter much. Now when things are captured we know exactly where it happened without having to do what the post is doing manually... adding stack info.
We actually went through the same realization when we started writing Rust a few years ago. The `thiserror` crate makes it easy to just wrap and return an error from some third-party library, like:
Since it derives a `From` implementation, you can use it as easily as: But if that's happening somewhere deep in your application and you call that function from more than one place, good luck figuring out what it is! You wind up with an error log like `third_party thing failed` and that's it.Generally, we now use structured error types with context fields, which adds some verbosity as specifying a context becomes required, but it's a lot more useful in error logs. Our approach was significantly inspired by this post from Sabrina Jewson: https://sabrinajewson.org/blog/errors
It's not a binary decision though. Just because the article arrives at overkill for most things in my opinion doesn't mean sentinel errors or wrapping errors in custom types should be avoided at all costs in all situations.
In my experience, it's good and healthy to introduce this additional context on the boundaries of more complex systems (like a database, or something accessing an external API and such), especially if other code wants to behave differently based on the errors returned (using errors.Is/errors.As).
But it's completely not necessary for every single plumping function starts inspecting and wrapping all errors it encounters, especially if it cannot make a decision on these errors or provide better context.
I agree; I've wasted countless hours troubleshooting errors returned in complex Go applications. The original error is not sufficient.
Do you maybe have a constructive advice for people that need to return errors that demand different behaviour from the calling code?
I gave an example higher in the thread: if searching for the entity that owns the creds.json files fails, we want to return a 404 HTTP error, but if creds.json itself is missing, we want a 401 HTTP error. What would be the idiomatic way of achieving this in your opinion?
With some of these examples, I'd change the API of the lower-level methods. Instead of a (Credentials, err) and the err is a NotFound sometimes, I'd rather make it a (*Credentials, bool, err) so you can have a (creds, found, err), and err would be used for actual errors like "File not found"/"File unreadable"/...
But other than that, there is nothing wrong with having sentinel errors or custom error types on your subsystem / module boundaries, like ErrCredentialsNotFetched, ErrUserNotFound, ErrFileInvalid and such. That's just good abstraction.
The main worry is: How many errors do you actually need, and how many functions need to mess about with the errors going around? More error types mean harder maintenance in the future because code will rely on those. Many plumbing or workflow functions probably should just hand the errors upwards because they can't do much about it anyways.
A lot of the details in the errors of the article very much feel like business logic and API design is getting conflated with the error framework.
Is "Cannot edit a whatsapp message template more than 24 hours" or "the users account is locked" really an error like "cannot open creds.json: permission denied" or "cannot query database: connection refused"? You can create working code like that, but I can also use exceptions for control flow. I'd expect these things to come from some OpenAPI spec and some controller-code make this decision in an if statement.
Use errors.Is and compare to the returned err to mypkg.ErrOwnerNotExists and mypkg.ErrMissingConfig and the handler decides which status code is appropriate
Cool, but error.Is what? In my case would both come as a os.NotExist errors because both are files on the disk.
I think that the original dismissal I replied to, might not have taken into account some of the complexities that OP most likely has given thought to and made decisions accordingly. Among those there's the need to extract or append the additional information OP seems to require (request id, tracking information, etc). Maybe it can be done all at the top level, but maybe not, maybe some come from deeper in the stack and need to be passed upwards.
no no no; do not return os.NotExists in both cases. The function needs to handle os.NotExists and then return mypkg.ErrOwnerNotExists or mypkg.ErrMissingConfig (or whatever names) depending on the state in the function.
The os.NotExists error is an implementation detail that is not important to callers. Callers shouldn't care about files on disk as that is leaking abstraction info. What if the function decides to move those configs to s3? Then callers have to update to handle s3 errors? No way. Return errors specific to your function that abstract the underlying implementation.
Edit: here is some sample code https://go.dev/play/p/vFnx_v8NBDf
Second edit: same code, but leveraging my other comment's kverr package to propagate context like kv pairs up the stack for logging: https://go.dev/play/p/pSk3s0Roysm
Exactly, and that's what OP argues for, albeit in a very complex manner.
Distilling their implementation to the basics, that's what we get: typed errors that wrap the Go standard library's ones with custom logic. Frankly I doubt that the API your library exposes (kv maps) vs OPs typed structs, is better. Maybe their main issue is relying on stuffing all error types in the same module, instead of having each independent app coming up with their own, but probably that's because they need the behaviour for handling those errors at the top of the calling stack is uniform and has only one implementation.
A quick back of the napkin list for what an error needs to contain to be useful in a post execution debugging context would be:
* calling stack
* traceability info like (request id, trace id, etc)
* data for the handling code to make meaningful distinction about how to handle the error
I think your library could be used for the last two, but I don't know how you store calling stack in kv pairs without some serious handwaving. Also kv is unreliable because it's not compile time checked to match at both ends.
I'm not saying use kverr for explicit error handling (like, you could, but that is non ideal), use kverr as a context bag of data you want to capture in a log. If you programmatically are routing with untyped string data, I agree, unreliable
> No surprise your errors get clunky if you make them clunky.
From a user perspective, good errors in go make me think or Perls croak/carp. Croak and carp gave you a stacktrace of your error, but it cut out all the module-internal calls and left you with the function calls across module boundaries. Very useful - enough so that Java discovered it again later on.
Personally, I wouldn't wrap the errors in loadCredentials at all. I'd just wrap the result of this method into an fmt.Errorf("failed to load credentials: %w"). This way the user knows the context the error happened in, and then we have to cross our fingers the error returned by this is good enough.
But something like "application startup failed: failed to load credentials: open cred.json: no such file or directory" is a very nice error message from an application. Just enough context to know what's going on, but no 1200 line stacktrace to sift through.
As someone that ended up implementing something very similar to TFA, I'd like to ask in which way can you pass errors from 3 layers deep in your stack to the top layer and maintain context?
Ie, when I can't find cred.json I want to return a 401 error, but when I can't find the entity cred.json is supposed to be owned by I want to return 404. How can one "not incompetent" Go developer solve this and distinguish between the two errors?
Adding error checks everywhere when you don't care about them is one of the ugliest things about Go.
What I do is have a utility package that lets me panic on most errors, so I can recover in a generalized handler.
x, err := doathing()
Catch(err, "didn't do the thing")
The majority of error handling is "the operation failed, so cancel the request." Sure there are places where the error matters and you can divert course, but that is far from the majority of cases.
I don't agree, but having said that, this feels like an entirely predictable/justifiable perspective to hold, given the terrible design of net/http in the standard library. Of course it feels easier to just panic, it's not like you can return an error from a handler. There is so much compatibility baggage from Go 1.0 in that package, that doing the right thing (contexts, errors, etc.) is so much harder than it should be, and most people end up doing the wrong thing because it's more ergonomic.
I usually use Echo which does have an error to return from handlers, but I don't think it's necessarily the wrong thing unless you're writing a library. I used to avoid panics with the same mindset that they aren't supposed to be used like exceptions, but I've found that panics are a clean way to handle a bulk of error cases that are "log and retreat", centralizing the process with some syntactic sugar to not have to check err != nil everywhere. More of my thoughts here if any are curious: https://blog.mukunda.com/cat/2022/dont-be-afraid-to-panic.tx...
I think one thing that could help if the codebase wants to avoid regular panics is more syntactic sugar to help error bubbling, like Rust has.
type xError struct { msg message, stack: callers(), }
is this legit in go?
This is a cry for sum types.
The fact that this code also has gorm in it in one of the examples is neither supportive of the proposal’s fit for the language, nor really surprising.
Too much writing and lack of diagramming is a sign of digging through the rabitt hole.
Bro got dragged so hard in the comments he took his site down. Oof.
I mean their intentions are good but if I worked at a place that made me use that error package I'd not have a good time
In general with golang, if something is not idiomatic Go then don't try too hard to fit constructs from other languages into it. Even the use of lodash like packages feels awkward in Go
more like hug of death from HN users. Since the site is back up and working again