Yes, you can have exactly-once delivery

(blog.rongarret.info)

54 points by lisper 2 months ago | 139 comments

In case you're someone who actually knows anything about distributed systems and you're not looking forward to slogging through this long article filled with claims like "I have a PhD in AI so I know what I'm talking about" to find where the author made their mistake, let me save you the time. It's the typical conflation of exactly-once delivery with exactly-once processing, which the author acknowledges and then chooses to ignore because they're basically the same for practical purposes, as if this somehow changes the reality of the delivery itself and the restrictions on its guarantees.

Yes, everyone knows you can layer an idempotency mechanism on top of at-least-once delivery to achieve exactly-once processing (so long as you're willing to tie up memory/storage for an infinite amount of time). But this does not equate to exactly-once delivery, and you know that.

Groxx 2 months ago | root | parent | next |

Yea... this is kinda inflammatory but I honestly have to agree with it.

The post largely summarizes as "you can have exactly-once delivery if you re-define it to be at-least-once processing with idempotency".

Those are different things.

In fact, that's the entire point behind saying that it's impossible.

You can't design a system that is exactly-once at any level, so don't even bother trying. If someone wants you to guarantee something will happen, you can point to that impossibility to say "you need to retry, it's not optional, and anyone who tells you otherwise is lying to you". That has happened to me multiple times in my career; it's a thing charlatans keep trying to sell to businesses, and businesses eat it up because it sounds so much magically simpler than what their engineers keep telling them needs to be done.

Because it is magic. It doesn't exist.

EGreg 2 months ago | root | parent | next |

This is sorta like the argument about CAP impossibility theorem while in practice consensus algorithms work 99.999% of the time. Or like Shannon’s information theory showing impossibility of compression, while many compression algorithms work well on actual data.

This seems to me the same. In practical applications, you can indeed have the at-least-once delivery with an idempotency / backpressure system, and work 99% of the time, and be unavailable 1% of the time.

Groxx 2 months ago | root | parent |

Yep, and for practical applications (i.e. stuff that exists in this universe) that is absolutely good enough. You just have to choose which tradeoffs you can stomach best. With a fancy enough system, those tradeoffs can be driven shockingly low.

But if someone tries to sell you a database that is 100% available and has perfect consistency, you can laugh and walk away. They're a flat-earther trying to sell you a bridge: they're either trying to trick you or they have no idea what they're talking about. Either way you don't want to be involved with them.

Gosh, only after this comment I understood why so many programmers litter the code with retries, even though they seem superfluous.

daymanstep 2 months ago | root | parent |

I mean, sometimes it is superfluous and you can get rid most of the code by wrapping everything in a big try-retry handler or something like that.

philipswood 2 months ago | root | parent | prev |

So if you build systems using messaging middleware you get to choose upfront whether you are going to use XA with exactly-once semantics and pay the performance penalty for that or whether you will implement your business logic idempotently on processing instead.

You can do this across a whole landscape of vendors.

So this whole class of "that's impossible" responses sounds to me (as an ex-brick layer) like "obviously you can't stack bricks next to each other perfectly straightly - so building walls is clearly impossible".

So it feels a bit jarring when several vendors allow you to do this impossible thing.

Google for JMS and exactly-once and you can find documentation with several products on exactly how to do this.

One example: https://www.atomikos.com/Blog/TheSimpleSecretOfExactlyOnceDe...

Groxx 2 months ago | root | parent |

>In this post we will show how [exactly once delivery] can be done, plus how simple it is with Atomikos. ...

>The producer should do its processing and send its message as part of a JTA/XA transaction. This ensures a message is sent if and only if the transaction commits. Any failures will result in rollback of the JTA/XA transaction - and no message will be sent. This means that failures can be safely retried until they succeed, without sending the resulting message more than once.

This is at-least-once (retries) with probably-idempotency: a transaction.

As I said: charlatans.

They're selling you "exactly once" in the headline, but clearly state that it is not exactly once in the legalese below. This is less "buyer beware" and more "blatantly false advertising", in exactly the same way as a free energy machine. The abstraction they're offering may indeed be useful, but "exactly once" is literally lying, and I wouldn't trust them to be honest elsewhere eit

Honestly, I really do find the traditional nomenclature to be a little pointless. It seems like the classic saying assumes that it's somehow okay to assume infinite time for re-delivery, but not infinite memory for memoization for some reason. On the other hand, in real life there aren't unlimited numbers of messages and you rarely want to accept infinitely stale messages either, so it's a bit moot. I'd go as far as to say that in practice you really can't guarantee a message will be delivered and processed because you will have finite bounds on time, the absolute best you can do is at least guarantee that it either was definitely processed once or probably was not and handle it accordingly. (I formerly wrote "definitely" for the latter, thinking you could do this with two-phase commit, and then realized after walking away from the computer that you absolutely can't guarantee that, of course. Distributed systems are such a pain to reason about.)

Do I misunderstand?

jhanschoo 2 months ago | root | parent | next |

> On the other hand, in real life there aren't unlimited numbers of messages and you rarely want to accept infinitely stale messages either, so it's a bit moot.

My understanding is that these happen IRL all the time in the guise of healing a network split or rebooting crashed nodes or bring new uninitialized servers into the system. Of course, IRL you usually translate the result to needing a different strategy to bring these systems up to speed beyond a certain threshold. But these thresholds and strategies and changing the number of nodes in the system are application-dependent, so the fiction of unbounded messages/memory/time helps focus the formal analysis and result.

In the context of, say, a distributed KV store, it cautions you that unless you have said other strategy, you will end up with an inconsistent system or failure state if your message buffers are more space-constrained than required.

Izkata 2 months ago | root | parent | prev |

> Honestly, I really do find the traditional nomenclature to be a little pointless. It seems like the classic saying assumes that it's somehow okay to assume infinite time for re-delivery, but not infinite memory for memoization for some reason.

This is exactly where the argument is coming from. The same people who will say "you can get at most once or at least once, but not only once" don't realize they're doing the exact same thing as the "you can get only once" people, when they criticize the conflation of delivery and processing. They'll argue "delivery" and "processing" have to be kept separate because of memory/storage/bandwidth/etc it uses up in the retries, which is why "only once delivery" can't exist and they actually mean "only once processing", but if you keep that reasoning in mind, there's also no such thing as "at least once delivery" - you'll run out of something at some point (or even just hit your retry limit) and have to drop the retries, resulting in no delivery.

The people saying you can get "only once delivery" by using "at least once"+idempotency are working under other group's definitions, then getting annoyed when the definitions are changed so this implementation of "only once" isn't allowed.

[flagged]

[deleted]

Spivak 2 months ago | root | parent | prev |

This article isn't for you then, this article is for people who have casually heard that exactly once delivery is impossible and take it to mean exactly once processing is impossible. When someone talks about at-least-once and at-most-once in the context of well-known queueing systems they say will say delivery but will mean processing, because as you say, they're the same in practice.

You typically hide the processing bit so that from the perspective of your application code it really is exactly-once.

bhaney 2 months ago | root | parent | next |

> this article is for people who have casually heard that exactly once delivery is impossible and take it to mean exactly once processing is impossible

Those people would be better served by approximately two sentences clarifying that exactly-once processing is a different thing that can be achieved with at-least-once delivery and idempotency, rather than 20+ rambling paragraphs of redefining formal terms.

Ferret7446 2 months ago | root | parent | prev |

I think the words "delivery" and "processing" are taught around middle school. There's probably no need to have an article for it.

tacitusarc 2 months ago | prev | next |

I appreciate the insights here, but I am struggling to understand how “exactly one” can equate to “eliminate duplicates”. Let’s say someone arrived at my house and cut my grass, and I failed to confirm they had done so, so the company sent someone over to cut my grass again, maybe multiple times. It seems silly to claim my grass was cut exactly once, despite it consistently remaining at the same height. Obviously it was cut multiple times, just not with much effect after the first. The point of exactly-once is that the server and client don’t need to expend pointless effort on duplicates… right?

valzam 2 months ago | root | parent | next |

In particular what exactly once delivery implies that I do not have to worry about it in my processing logic. I can build a `count += 1` and it will always be exactly correct.

The notion that there is no distinction between exactly once delivery and exactly once processing is very odd to me. In practice my processing needs to accommodate duplicates to be correct. If I had exactly once delivery my processing could be much simpler. If I could get exactly once delivery for free I would always choose it in a heartbeat.

jchw 2 months ago | root | parent |

The point is that it doesn't matter exactly where the deduplication matters. It could happen in your own processing code, or something upstream of it, like a queue library of some kind. That's pretty much what the entire article is saying; it's hard to meaningfully distinguish what part is actually delivery versus processing. e.g. most people would consider the guarantees imparted by the TCP stack are indeed part of delivery and not processing, but your TCP stack is having to do a lot of processing work to actually maintain the logical stream of bytes.

warkdarrior 2 months ago | root | parent |

> The point is that it doesn't matter exactly where the deduplication matters.

Actually the point is that once deduplication is done at some layer, the layers above it will have to re-achieve exactly-once delivery.

"Yes, the TCP layer did deliver this message only once, but the receiving software crashed right after, so now the sender has to send it again."

jchw 2 months ago | root | parent |

Hmmm. Maybe this is the reason why the processing vs delivery distinction matters. Because my thought is, well of course: To fix that you only send the acknowledgement after processing succeeds.

But then again, once you do that, the processing code that is being wrapped really doesn't have to care about being idempotent anymore, as it is being handled a layer up. At that point, all it needs to care about is being atomic.

I'm not sure if it practically matters either way. I'd rather have my processing code be both atomic and idempotent regardless just to make things easier to reason about, as long as it's not too much of a burden. I've always been a fan of concepts like idempotency tokens.

Same understanding: On the receiver side, we are going to drop duplicates (by processing, or by having no effect on the grass cutting any more). Thus, the end user is then seeing only one effect, one message delivered. The effect of delivery "message received" or "grass is cut" is achieved.

But still, the sender might need to send more than once (until confirmation). From the cost at the sender "sending multiple packages" or "sending more grass cutters" this is still the scenario "send one or more".

Sorry to fuel the fire... it is about the definition of "delivery"

theamk 2 months ago | root | parent | prev |

We are talking network stack, so there is no actions - just data hand-off to the actual application code.

Someone arrives at your house, gives you a package, says "this is order 123". You thank them, they leave, but then they are hit by a car before they can report this. You unpack the package and use it.

Next day, someone else arrives at your house, gives you a package, says "this is order 123". You thank them, they leave. You know you've already received order 123, so you throw package away without even taking it into the house.

This happens few more times, but you don't care, your trash can is big.

Done! You now have "exactly once delivery".

Now, some might argue this is "exactly once processing" and you should only count what the delivery person does.. but this depends on where you draw the boundary. I draw it at "I am taking the package into the house", and I've only ever took one package there, so it was exactly-once for me.

The key part here is cost. I am assuming that opening package and using its contents is hard and takes a long time; while answering the door and throwing the package away is easy. This is definitely the case with modern networking stack, which re-transmits stuff all the time, and where the loss rate is very low.

tacitusarc 2 months ago | root | parent |

As this is a semantic debate over the definition of delivery, I asked my very non-technical wife if she thought in the scenario you described, the package was delivered exactly once. She said obviously not, and this discussion is very stupid, and I should stop participating in it. So there’s that.

jbergens 2 months ago | root | parent | next |

I don't think the example was perfect which explains your wife's reaction.

Think of it more like the first delivery guy/girl left his/her car outside and wrote 123 on it. Then walked back.

The next one sees the car with a sign saying 123 and won't even ring the door bell or leave a package. Now you haven't gotten the package twice, it has not been delivered twice.

Sure you can complain that there's a car outside your home but in digital system you won't even see it. It would also cost the deliver firm a car for every package but that is not your problem and again, in the digital world the cost is a lot less than a car.

There is an argument that the street would be filled up with delivery vans ans there would be no more room for new deliveries to you or your neighbors but that is a limitation you could talk about. You probably can't handle an infinite number of packages delivered at the same time either and you won't wait an infinite amount of time for any specific package.

Try this version with your wife.

2 months ago | root | parent |

[deleted]

Jach 2 months ago | root | parent | prev |

Smart wife. My take on the whole thing is that it's not wise to reason from non-technical metaphors around packages or lawn mowing when the reality is electronic systems. I don't know if it's any wiser but what I like to do is work my way up from the basics. What does delivery mean? Start with two wires, one for signal and one for common ground. (Or just one wire, and pretend you can use earth-return reliably.) If that isn't enough to resolve what terms should mean, consider them with differential signaling. If that still isn't enough to get it, consider them with relay nodes. If at some point "delivery" has changed definitions to suddenly forbid something that previously wasn't forbidden, maybe you've made a mistake.

computerfan494 2 months ago | prev | next |

At the end of the day the author and those they are arguing with mostly agree, they simply disagree on what the word "delivery" means. Given the author's background, I wonder if the issue is that they're focused mainly on lower levels of the stack, while those who disagree mainly work with traditional applications that do things like send email, respond to webhooks, update databases, etc.

The reason I think it's important to be pedantic about distinguishing between "delivery" and "processing" is that I have seen plenty of higher level systems that have incorrectly not implemented idempotency and had bugs as a result. I have seen many folks be confused by Kafka's "Exactly-Once Semantics" feature and introduce major bugs into message processing pipelines. The author, who clearly understands these fundamental design challenges, is not my problem. It's everyone else who struggles to design safe, idempotent exactly-once systems.

jumploops 2 months ago | prev | next |

If I receive multiple hamburgers via DoorDash, but only eat one, it’s not “exactly-once delivery.”

The extra step where I give my neighbor(s), compost, or otherwise discard the N+1 hamburgers is a processing step.

My house can only hold so many hamburgers, and I can only process so many after eating my lucky chosen one.

This is what we (distributed systems thinkers) refer to when we say “delivery” — anything after the DoorDash step is up to us, the consumer, to process.

Yes, this definition is confusing to new programmers, because it makes it hard to reason about everyday systems, but it’s this exact type of definition that we need so that we can build the proper abstractions, as the author has done in his post, to make our applications behave the way we want.

supportengineer 2 months ago | prev | next |

Over 20 years ago I worked with some specialized commercial software, Cyclone, that did guaranteed delivery of files. Guaranteed as in the legal sense. If the server sent back a ticket number to the sending client, it was a LEGAL assurance that the file was received, because there was a contract in place with financial penalties. The time stamps were particularly important in the legal contract. So, there are a lot of ways you can have exactly-once delivery, especially when talking about the application level of the 7 layer burrito model.

hinkley 2 months ago | root | parent |

But you sometimes delivered the bytes twice, before getting the receipt, surely?

theamk 2 months ago | root | parent |

I am sure there were some duplicate packets on the wire - it's a normal part of most network protocols. The important thing is that as far as user was concerned, it was exactly-once.

toast0 2 months ago | prev | next |

I don't see how you can guarantee at least one delivery, but maybe I've seen too many things disappear off busses, never to be seen again.

Filligree 2 months ago | root | parent |

You keep sending it until you get an acknowledgement back. As the article points out, this does assume nobody cut the wire.

deathanatos 2 months ago | root | parent |

  SEND -->
  SEND -->
       <-- ACK
       <-- ACK (this node was *suuuper* slow due to page thrashing.)

Don't try to "fix" it, either; there's no way to do that.

You make the job idempotent, in some manner: make it such that receiving the message and processing it twice is safe.

(This is what TFA eventually capitulates to, but tries to call it "exactly once delivery", even in cases where deliver is occurring more than once. I don't think this view is pedagogically useful, which is partly why we say "exactly once delivery is impossible.")

benreesman 2 months ago | prev | next |

This stuff is studied to hell and back, there is a formalism.

This stuff is practiced at staggering scale and the heuristics and cheat-while-no-one-looks stuff is gamed to within an inch of its life.

There is an acknowledged nexus of the two in the public domain: https://jepsen.io/consistency.

lukeasrodgers 2 months ago | prev | next |

Here is my understanding, roughly:

- say you need a messaging system to communicate between different components - that messaging system is a 3rd party library or tool, it has no knowledge of your needs or architecture - therefore it can have no knowledge of what counts as a duplicate message, it either just blasts your message off once, or blasts them off until it gets an ack, it is up to the software you build around this component to avoid duplicate processing - so yes of course you can build "exactly once processing" on top of an "at least once delivery" system - but it still makes sense to talk about the distinction between delivery and processing, and "exactly once delivery is impossible" is still (in OP's terms) a "useful" claim

I haven't personally used kafka but it and similar systems (I vaguely recall some work by Pat Helland that may fall into a similar bucket) could possibly be said to a) constitute messaging systems, b) provide exactly once delivery semantics, in that they are less of a library and more of a framework that provide a concept of "duplicate message" that you basically buy into by using those systems.

You could then argue that "if it provides exactly once delivery it is not a messaging system", maybe there's a good argument there or maybe it's just pedantry.

gbonik 2 months ago | prev | next |

So it seems it boils down to the difference between "delivery" and "processing".

I think we can make this distinction formally. For a given communication channel C, we can define "delivery via C" as a message showing up successfully at the receiver end of C. This definition seems unambiguous.

Now, we can phrase our "theorem" more carefully:

    For an arbitrary given communication channel C, exactly-once delivery via C is generally impossible.

The important part here is "For an arbitrary given communication channel C". By adding a layer of deduplication on top of C, we would be constructing a new logical communication channel C', via which exactly-once delivery is indeed possible. But that would be a different channel C', not the original channel C that we were given. In this context, we can refer to delivery via C' as "processing" relative to the original channel C.

2 months ago | prev | next |

[deleted]

Fire-Dragon-DoL 2 months ago | prev | next |

Isn't "exactly once processing" also incorrect, since it's all always "at most once": the system could go down and never come back online, that would result in a missed processing

two_handfuls 2 months ago | root | parent |

All these guarantees are of the form "if (...) then: exactly once processing."

You'll find that, indeed, those assumptions must include ruling out permanent link failures.

stackghost 2 months ago | prev | next |

>a simple matter of keeping track of all the delivered messages and removing duplicates

Ron, if you have received duplicate messages then by definition you have been delivered that message more than once.

I don't have a PhD in computer science so maybe you can explain how this constitutes "exactly once".