bhaney 2 months ago | next |

In case you're someone who actually knows anything about distributed systems and you're not looking forward to slogging through this long article filled with claims like "I have a PhD in AI so I know what I'm talking about" to find where the author made their mistake, let me save you the time. It's the typical conflation of exactly-once delivery with exactly-once processing, which the author acknowledges and then chooses to ignore because they're basically the same for practical purposes, as if this somehow changes the reality of the delivery itself and the restrictions on its guarantees.

Yes, everyone knows you can layer an idempotency mechanism on top of at-least-once delivery to achieve exactly-once processing (so long as you're willing to tie up memory/storage for an infinite amount of time). But this does not equate to exactly-once delivery, and you know that.

Groxx 2 months ago | root | parent | next |

Yea... this is kinda inflammatory but I honestly have to agree with it.

The post largely summarizes as "you can have exactly-once delivery if you re-define it to be at-least-once processing with idempotency".

Those are different things.

In fact, that's the entire point behind saying that it's impossible.

You can't design a system that is exactly-once at any level, so don't even bother trying. If someone wants you to guarantee something will happen, you can point to that impossibility to say "you need to retry, it's not optional, and anyone who tells you otherwise is lying to you". That has happened to me multiple times in my career; it's a thing charlatans keep trying to sell to businesses, and businesses eat it up because it sounds so much magically simpler than what their engineers keep telling them needs to be done.

Because it is magic. It doesn't exist.

EGreg 2 months ago | root | parent | next |

This is sorta like the argument about CAP impossibility theorem while in practice consensus algorithms work 99.999% of the time. Or like Shannon’s information theory showing impossibility of compression, while many compression algorithms work well on actual data.

This seems to me the same. In practical applications, you can indeed have the at-least-once delivery with an idempotency / backpressure system, and work 99% of the time, and be unavailable 1% of the time.

Groxx 2 months ago | root | parent |

Yep, and for practical applications (i.e. stuff that exists in this universe) that is absolutely good enough. You just have to choose which tradeoffs you can stomach best. With a fancy enough system, those tradeoffs can be driven shockingly low.

But if someone tries to sell you a database that is 100% available and has perfect consistency, you can laugh and walk away. They're a flat-earther trying to sell you a bridge: they're either trying to trick you or they have no idea what they're talking about. Either way you don't want to be involved with them.

sam_lowry_ 2 months ago | root | parent | prev | next |

Gosh, only after this comment I understood why so many programmers litter the code with retries, even though they seem superfluous.

daymanstep 2 months ago | root | parent |

I mean, sometimes it is superfluous and you can get rid most of the code by wrapping everything in a big try-retry handler or something like that.

philipswood 2 months ago | root | parent | prev |

So if you build systems using messaging middleware you get to choose upfront whether you are going to use XA with exactly-once semantics and pay the performance penalty for that or whether you will implement your business logic idempotently on processing instead.

You can do this across a whole landscape of vendors.

So this whole class of "that's impossible" responses sounds to me (as an ex-brick layer) like "obviously you can't stack bricks next to each other perfectly straightly - so building walls is clearly impossible".

So it feels a bit jarring when several vendors allow you to do this impossible thing.

Google for JMS and exactly-once and you can find documentation with several products on exactly how to do this.

One example: https://www.atomikos.com/Blog/TheSimpleSecretOfExactlyOnceDe...

Groxx 2 months ago | root | parent |

>In this post we will show how [exactly once delivery] can be done, plus how simple it is with Atomikos. ...

>The producer should do its processing and send its message as part of a JTA/XA transaction. This ensures a message is sent if and only if the transaction commits. Any failures will result in rollback of the JTA/XA transaction - and no message will be sent. This means that failures can be safely retried until they succeed, without sending the resulting message more than once.

This is at-least-once (retries) with probably-idempotency: a transaction.

As I said: charlatans.

They're selling you "exactly once" in the headline, but clearly state that it is not exactly once in the legalese below. This is less "buyer beware" and more "blatantly false advertising", in exactly the same way as a free energy machine. The abstraction they're offering may indeed be useful, but "exactly once" is literally lying, and I wouldn't trust them to be honest elsewhere eit

jchw 2 months ago | root | parent | prev | next |

Honestly, I really do find the traditional nomenclature to be a little pointless. It seems like the classic saying assumes that it's somehow okay to assume infinite time for re-delivery, but not infinite memory for memoization for some reason. On the other hand, in real life there aren't unlimited numbers of messages and you rarely want to accept infinitely stale messages either, so it's a bit moot. I'd go as far as to say that in practice you really can't guarantee a message will be delivered and processed because you will have finite bounds on time, the absolute best you can do is at least guarantee that it either was definitely processed once or probably was not and handle it accordingly. (I formerly wrote "definitely" for the latter, thinking you could do this with two-phase commit, and then realized after walking away from the computer that you absolutely can't guarantee that, of course. Distributed systems are such a pain to reason about.)

Do I misunderstand?

jhanschoo 2 months ago | root | parent | next |

> On the other hand, in real life there aren't unlimited numbers of messages and you rarely want to accept infinitely stale messages either, so it's a bit moot.

My understanding is that these happen IRL all the time in the guise of healing a network split or rebooting crashed nodes or bring new uninitialized servers into the system. Of course, IRL you usually translate the result to needing a different strategy to bring these systems up to speed beyond a certain threshold. But these thresholds and strategies and changing the number of nodes in the system are application-dependent, so the fiction of unbounded messages/memory/time helps focus the formal analysis and result.

In the context of, say, a distributed KV store, it cautions you that unless you have said other strategy, you will end up with an inconsistent system or failure state if your message buffers are more space-constrained than required.

Izkata 2 months ago | root | parent | prev |

> Honestly, I really do find the traditional nomenclature to be a little pointless. It seems like the classic saying assumes that it's somehow okay to assume infinite time for re-delivery, but not infinite memory for memoization for some reason.

This is exactly where the argument is coming from. The same people who will say "you can get at most once or at least once, but not only once" don't realize they're doing the exact same thing as the "you can get only once" people, when they criticize the conflation of delivery and processing. They'll argue "delivery" and "processing" have to be kept separate because of memory/storage/bandwidth/etc it uses up in the retries, which is why "only once delivery" can't exist and they actually mean "only once processing", but if you keep that reasoning in mind, there's also no such thing as "at least once delivery" - you'll run out of something at some point (or even just hit your retry limit) and have to drop the retries, resulting in no delivery.

The people saying you can get "only once delivery" by using "at least once"+idempotency are working under other group's definitions, then getting annoyed when the definitions are changed so this implementation of "only once" isn't allowed.

Spivak 2 months ago | root | parent | prev |

This article isn't for you then, this article is for people who have casually heard that exactly once delivery is impossible and take it to mean exactly once processing is impossible. When someone talks about at-least-once and at-most-once in the context of well-known queueing systems they say will say delivery but will mean processing, because as you say, they're the same in practice.

You typically hide the processing bit so that from the perspective of your application code it really is exactly-once.

bhaney 2 months ago | root | parent | next |

> this article is for people who have casually heard that exactly once delivery is impossible and take it to mean exactly once processing is impossible

Those people would be better served by approximately two sentences clarifying that exactly-once processing is a different thing that can be achieved with at-least-once delivery and idempotency, rather than 20+ rambling paragraphs of redefining formal terms.

Ferret7446 2 months ago | root | parent | prev |

I think the words "delivery" and "processing" are taught around middle school. There's probably no need to have an article for it.

tacitusarc 2 months ago | prev | next |

I appreciate the insights here, but I am struggling to understand how “exactly one” can equate to “eliminate duplicates”. Let’s say someone arrived at my house and cut my grass, and I failed to confirm they had done so, so the company sent someone over to cut my grass again, maybe multiple times. It seems silly to claim my grass was cut exactly once, despite it consistently remaining at the same height. Obviously it was cut multiple times, just not with much effect after the first. The point of exactly-once is that the server and client don’t need to expend pointless effort on duplicates… right?

valzam 2 months ago | root | parent | next |

In particular what exactly once delivery implies that I do not have to worry about it in my processing logic. I can build a `count += 1` and it will always be exactly correct.

The notion that there is no distinction between exactly once delivery and exactly once processing is very odd to me. In practice my processing needs to accommodate duplicates to be correct. If I had exactly once delivery my processing could be much simpler. If I could get exactly once delivery for free I would always choose it in a heartbeat.

jchw 2 months ago | root | parent |

The point is that it doesn't matter exactly where the deduplication matters. It could happen in your own processing code, or something upstream of it, like a queue library of some kind. That's pretty much what the entire article is saying; it's hard to meaningfully distinguish what part is actually delivery versus processing. e.g. most people would consider the guarantees imparted by the TCP stack are indeed part of delivery and not processing, but your TCP stack is having to do a lot of processing work to actually maintain the logical stream of bytes.

warkdarrior 2 months ago | root | parent |

> The point is that it doesn't matter exactly where the deduplication matters.

Actually the point is that once deduplication is done at some layer, the layers above it will have to re-achieve exactly-once delivery.

"Yes, the TCP layer did deliver this message only once, but the receiving software crashed right after, so now the sender has to send it again."

jchw 2 months ago | root | parent |

Hmmm. Maybe this is the reason why the processing vs delivery distinction matters. Because my thought is, well of course: To fix that you only send the acknowledgement after processing succeeds.

But then again, once you do that, the processing code that is being wrapped really doesn't have to care about being idempotent anymore, as it is being handled a layer up. At that point, all it needs to care about is being atomic.

I'm not sure if it practically matters either way. I'd rather have my processing code be both atomic and idempotent regardless just to make things easier to reason about, as long as it's not too much of a burden. I've always been a fan of concepts like idempotency tokens.

schobi 2 months ago | root | parent | prev | next |

Same understanding: On the receiver side, we are going to drop duplicates (by processing, or by having no effect on the grass cutting any more). Thus, the end user is then seeing only one effect, one message delivered. The effect of delivery "message received" or "grass is cut" is achieved.

But still, the sender might need to send more than once (until confirmation). From the cost at the sender "sending multiple packages" or "sending more grass cutters" this is still the scenario "send one or more".

Sorry to fuel the fire... it is about the definition of "delivery"

theamk 2 months ago | root | parent | prev |

We are talking network stack, so there is no actions - just data hand-off to the actual application code.

Someone arrives at your house, gives you a package, says "this is order 123". You thank them, they leave, but then they are hit by a car before they can report this. You unpack the package and use it.

Next day, someone else arrives at your house, gives you a package, says "this is order 123". You thank them, they leave. You know you've already received order 123, so you throw package away without even taking it into the house.

This happens few more times, but you don't care, your trash can is big.

Done! You now have "exactly once delivery".

Now, some might argue this is "exactly once processing" and you should only count what the delivery person does.. but this depends on where you draw the boundary. I draw it at "I am taking the package into the house", and I've only ever took one package there, so it was exactly-once for me.

The key part here is cost. I am assuming that opening package and using its contents is hard and takes a long time; while answering the door and throwing the package away is easy. This is definitely the case with modern networking stack, which re-transmits stuff all the time, and where the loss rate is very low.

tacitusarc 2 months ago | root | parent |

As this is a semantic debate over the definition of delivery, I asked my very non-technical wife if she thought in the scenario you described, the package was delivered exactly once. She said obviously not, and this discussion is very stupid, and I should stop participating in it. So there’s that.

jbergens 2 months ago | root | parent | next |

I don't think the example was perfect which explains your wife's reaction.

Think of it more like the first delivery guy/girl left his/her car outside and wrote 123 on it. Then walked back.

The next one sees the car with a sign saying 123 and won't even ring the door bell or leave a package. Now you haven't gotten the package twice, it has not been delivered twice.

Sure you can complain that there's a car outside your home but in digital system you won't even see it. It would also cost the deliver firm a car for every package but that is not your problem and again, in the digital world the cost is a lot less than a car.

There is an argument that the street would be filled up with delivery vans ans there would be no more room for new deliveries to you or your neighbors but that is a limitation you could talk about. You probably can't handle an infinite number of packages delivered at the same time either and you won't wait an infinite amount of time for any specific package.

Try this version with your wife.

Jach 2 months ago | root | parent | prev |

Smart wife. My take on the whole thing is that it's not wise to reason from non-technical metaphors around packages or lawn mowing when the reality is electronic systems. I don't know if it's any wiser but what I like to do is work my way up from the basics. What does delivery mean? Start with two wires, one for signal and one for common ground. (Or just one wire, and pretend you can use earth-return reliably.) If that isn't enough to resolve what terms should mean, consider them with differential signaling. If that still isn't enough to get it, consider them with relay nodes. If at some point "delivery" has changed definitions to suddenly forbid something that previously wasn't forbidden, maybe you've made a mistake.

computerfan494 2 months ago | prev | next |

At the end of the day the author and those they are arguing with mostly agree, they simply disagree on what the word "delivery" means. Given the author's background, I wonder if the issue is that they're focused mainly on lower levels of the stack, while those who disagree mainly work with traditional applications that do things like send email, respond to webhooks, update databases, etc.

The reason I think it's important to be pedantic about distinguishing between "delivery" and "processing" is that I have seen plenty of higher level systems that have incorrectly not implemented idempotency and had bugs as a result. I have seen many folks be confused by Kafka's "Exactly-Once Semantics" feature and introduce major bugs into message processing pipelines. The author, who clearly understands these fundamental design challenges, is not my problem. It's everyone else who struggles to design safe, idempotent exactly-once systems.

jumploops 2 months ago | prev | next |

If I receive multiple hamburgers via DoorDash, but only eat one, it’s not “exactly-once delivery.”

The extra step where I give my neighbor(s), compost, or otherwise discard the N+1 hamburgers is a processing step.

My house can only hold so many hamburgers, and I can only process so many after eating my lucky chosen one.

This is what we (distributed systems thinkers) refer to when we say “delivery” — anything after the DoorDash step is up to us, the consumer, to process.

Yes, this definition is confusing to new programmers, because it makes it hard to reason about everyday systems, but it’s this exact type of definition that we need so that we can build the proper abstractions, as the author has done in his post, to make our applications behave the way we want.

supportengineer 2 months ago | prev | next |

Over 20 years ago I worked with some specialized commercial software, Cyclone, that did guaranteed delivery of files. Guaranteed as in the legal sense. If the server sent back a ticket number to the sending client, it was a LEGAL assurance that the file was received, because there was a contract in place with financial penalties. The time stamps were particularly important in the legal contract. So, there are a lot of ways you can have exactly-once delivery, especially when talking about the application level of the 7 layer burrito model.

hinkley 2 months ago | root | parent |

But you sometimes delivered the bytes twice, before getting the receipt, surely?

theamk 2 months ago | root | parent |

I am sure there were some duplicate packets on the wire - it's a normal part of most network protocols. The important thing is that as far as user was concerned, it was exactly-once.

toast0 2 months ago | prev | next |

I don't see how you can guarantee at least one delivery, but maybe I've seen too many things disappear off busses, never to be seen again.

Filligree 2 months ago | root | parent |

You keep sending it until you get an acknowledgement back. As the article points out, this does assume nobody cut the wire.

deathanatos 2 months ago | root | parent |

  SEND -->
  SEND -->
       <-- ACK
       <-- ACK (this node was *suuuper* slow due to page thrashing.)
Don't try to "fix" it, either; there's no way to do that.

You make the job idempotent, in some manner: make it such that receiving the message and processing it twice is safe.

(This is what TFA eventually capitulates to, but tries to call it "exactly once delivery", even in cases where deliver is occurring more than once. I don't think this view is pedagogically useful, which is partly why we say "exactly once delivery is impossible.")

benreesman 2 months ago | prev | next |

This stuff is studied to hell and back, there is a formalism.

This stuff is practiced at staggering scale and the heuristics and cheat-while-no-one-looks stuff is gamed to within an inch of its life.

There is an acknowledged nexus of the two in the public domain: https://jepsen.io/consistency.

lukeasrodgers 2 months ago | prev | next |

Here is my understanding, roughly:

- say you need a messaging system to communicate between different components - that messaging system is a 3rd party library or tool, it has no knowledge of your needs or architecture - therefore it can have no knowledge of what counts as a duplicate message, it either just blasts your message off once, or blasts them off until it gets an ack, it is up to the software you build around this component to avoid duplicate processing - so yes of course you can build "exactly once processing" on top of an "at least once delivery" system - but it still makes sense to talk about the distinction between delivery and processing, and "exactly once delivery is impossible" is still (in OP's terms) a "useful" claim

I haven't personally used kafka but it and similar systems (I vaguely recall some work by Pat Helland that may fall into a similar bucket) could possibly be said to a) constitute messaging systems, b) provide exactly once delivery semantics, in that they are less of a library and more of a framework that provide a concept of "duplicate message" that you basically buy into by using those systems.

You could then argue that "if it provides exactly once delivery it is not a messaging system", maybe there's a good argument there or maybe it's just pedantry.

gbonik 2 months ago | prev | next |

So it seems it boils down to the difference between "delivery" and "processing".

I think we can make this distinction formally. For a given communication channel C, we can define "delivery via C" as a message showing up successfully at the receiver end of C. This definition seems unambiguous.

Now, we can phrase our "theorem" more carefully:

    For an arbitrary given communication channel C, exactly-once delivery via C is generally impossible.
The important part here is "For an arbitrary given communication channel C". By adding a layer of deduplication on top of C, we would be constructing a new logical communication channel C', via which exactly-once delivery is indeed possible. But that would be a different channel C', not the original channel C that we were given. In this context, we can refer to delivery via C' as "processing" relative to the original channel C.

Fire-Dragon-DoL 2 months ago | prev | next |

Isn't "exactly once processing" also incorrect, since it's all always "at most once": the system could go down and never come back online, that would result in a missed processing

two_handfuls 2 months ago | root | parent |

All these guarantees are of the form "if (...) then: exactly once processing."

You'll find that, indeed, those assumptions must include ruling out permanent link failures.

stackghost 2 months ago | prev | next |

>a simple matter of keeping track of all the delivered messages and removing duplicates

Ron, if you have received duplicate messages then by definition you have been delivered that message more than once.

I don't have a PhD in computer science so maybe you can explain how this constitutes "exactly once".

lisper 2 months ago | root | parent |

> if you have received duplicate messages then by definition you have been delivered that message more than once

Yes. So? If you don't include a mechanism for ensuring exactly-once delivery then you will not be guaranteed exactly-once delivery. But if you do, you will. It doesn't actually require a Ph.D.

stackghost 2 months ago | root | parent |

>> if you have received duplicate messages then by definition you have been delivered that message more than once

>Yes.

Okay, I'm glad we agree that delivering something multiple times and discarding the duplicates is not exactly-once delivery.

lisper 2 months ago | root | parent |

As you have described it, you are right, it's not. But presumably the thing discarding the duplicates then does something with the non-duplicates, like deliver then to something else. That is exactly-once delivery.

mlyle 2 months ago | root | parent | next |

To discard the duplicates, you need to update your state based on the first message and consult that state for all subsequent messages. Someone must be delivered those messages for this to happen.

If you say that someone can then pass you messages exactly once over a perfectly reliable channel, well.. sure. But the system still has to deal with duplicate delivery and update its state based on that. (And potentially must durably persist this state. And, then, if your application doesn't have exactly the same persistence/transactional boundaries as what's doing the deduping, you have problems...).

lisper 2 months ago | root | parent |

> To discard the duplicates, you need to update your state based on the first message and consult that state for all subsequent messages. Someone must be delivered those messages for this to happen.

Yes, that's right.

> If you say that someone can then pass you messages exactly once over a perfectly reliable channel, well.. sure.

So you concede that you were wrong?

> But the system still has to deal with duplicate delivery and update its state based on that.

Well, part of the system does. But that part can be hidden behind an abstraction layer so that "you" can have exactly-once delivery doe some value of "you".

> (And potentially must durably persist this state. And, then, if your application doesn't have exactly the same persistence/transactional boundaries as what's doing the deduping, you have problems...).

Sure, but as I pointed out to you in the other thread in which we are engaged, that is irrelevant to the question of whether or not exactly-once delivery is possible.

mlyle 2 months ago | root | parent |

No, considering something at the end of a fault-free channel as the element of the distributed system being delivered to is silly.

> we are engaged

We are no longer engaged, because I've given up on talking to you. You can use "delivered" however you like by yourself, as you're eliminating everyone else's desire to listen to you.

I think it's ridiculous that you're back to continue to poke this a day later.

Find a hobby. Or read a textbook. I suggest Steen and Tanenbaum, which explains exactly-once semantics in distributed systems in a careful and thorough way (though this particular book doesn't say "delivery", unlike many others.

lisper 2 months ago | root | parent |

> considering something at the end of a fault-free channel as the element of the distributed system being delivered to is silly.

That may be, but that's a different discussion. "Silly" and "impossible" are not synonyms.

Once you concede that you are wrong about EOD being impossible we can go on to discuss whether or not it is silly. And note that all I have to do to show that it isn't silly is to present one realistic scenario where it might be useful.

> I've given up on talking to you.

Manifestly not.

mlyle 2 months ago | root | parent |

I am not interested in talking to you anymore. When you reply to my comments it gives me a notification, and I am unable to block you here. Please go away.

If I have to write tooling to make it stop, I will.

lisper 2 months ago | root | parent |

> I am not interested in talking to you anymore.

Then why do you keep doing it?

mlyle 2 months ago | root | parent |

I've installed the block user extension and written an email filtering rule for you. You may now have the last word, but I will never see it.

You prompted this by continuing the discussion and notifying me in other threads past when this discussion was flagged and I had stopped responding to you... and then even beyond when I had asked you to stop. In seven years of prior discussion on HN, this has never been necessary.

stackghost 2 months ago | root | parent | prev |

You wrote in the post:

>This post was intended to be about human communication more than distributed systems or network protocols

But resorting to technical minutiae as you have done doesn't seem germane to "human communication". Honestly, seeing such mental gymnastics just to avoid losing face on an internet argument is really disappointing.

lisper 2 months ago | root | parent |

What you call "technical minutiae" I call "the meanings of words", which is very germane to the matter of human communication.

upeguiborja 2 months ago | prev | next |

It is only possible if the channel between your "abstraction layer" (call it another system, processing, etc) and your receiver is 100% reliable.

lisper 2 months ago | root | parent |

Yes, that's true, but that's often a reasonable assumption.

upeguiborja 2 months ago | root | parent |

The main confusion I see is that at-least-once delivery on a 99% reliable channel can be erroneusly called exactly once delivery 99 out of 100 times. It could be reasonable depending on the scale of your communication exercise, but if you have hundreds of thousands of receivers then asumption will surely start to fail.

lisper 2 months ago | root | parent |

The kind of reliability I'm talking about where "100% reliable" is a reasonable assumption is inside the boundaries of a single server, i.e. a CPU, memory bus, and I/O peripherals including a NIC and non-volatile storage. Yes, things can and do fail inside that boundary, but if you start taking that into account things get much more complicated. You are also no longer in the realm of distributed systems at that point, which is the context in which the controversy about exactly-once delivery arises.

jongjong 2 months ago | prev | next |

Yes of course you can have exactly once delivery if you clearly define who/what the receiver is. You can easily deduplicate messages on the receiving end based on UUIDs created by the sender for example... Since they are duplicates of the exact same message, it doesn't matter which one is 'the original'.

It makes sense to worry about this only if you're worried about wasting bandwidth in the event of network instability (since the same message may sometimes traverse the network multiple times) but that's not generally something engineers should worry about.

It's ironic how some people use this to try to talk down to 'junior' developers.

Anyone can memorise hearsay about distributed systems but few can speak from experience.

qaq 2 months ago | root | parent | next |

you can't in general case because you don't have infinite memory for dedupe.

jongjong 2 months ago | root | parent | next |

Good point but you don't need infinite memory. You can set expiries to discard, for example. Expiries and timeouts can be defined as part of the protocol. You can impose reasonable constraints per-socket on the buffer size for spam prevention. There is a lot of wiggle room.

On the sender side, you can require an ACK response for each message UUID and rebroadcast only if the ACK is not received within a certain timeframe.

You don't necessarily need a sophisticated algorithm to get a good practical solution which solves real problems.

mtndew4brkfst 2 months ago | root | parent | next |

If lack of a successful ACK means you retry indefinitely until you do get an ACK, then whichever side handles deduplication need to store all observed messages indefinitely or you may deliver the same message more than once without correctly deduplicating it.

If instead you only store them for N days but an ongoing retry loop finally succeeds in N+1 days, you get duplicated delivery.

If a retry loop gives up after finite elapsed time, which sets a ceiling on how long you must retain observations, then you do not in fact achieve at-least-once delivery without using a lesser qualifying statement.

One of those constraints must be compromised on. That doesn't mean a practically useful implementation can't exist, it just means a theoretically ideal one can't without the infinite memory.

Many systems and real world processes would actually suffer negative consequences if a weeks-old payload was delivered-as-new, so hardly anyone balks at this part of the implications. But again, you have to redefine at least one of the claims (or acquire infinite memory) to have a coherent system that fulfills those claims.

lisper 2 months ago | root | parent | prev |

You apparently missed this:

"Just for the sake of completeness I should point out that removing duplicates at the receiver is a pretty extreme oversimplification of what you would do in practice to provide exactly-once delivery. A complete solution would almost certainly be an example of Greenspun's Tenth Law applied to the TCP protocol rather than Common Lisp."

nhumrich 2 months ago | root | parent | prev |

You seem to be missing the point though. Wanting "exactly once delivery" in practice is like saying, "I shouldn't need to worry about dedupes at the application level". Which is everyone's dream, but the seniors are saying, "yes, you have to handle it at the application level, there is no way around that".

qaq 2 months ago | prev | next |

Well pretty confused. You can't get exactly once delivery using proposed solution in general case because you can not have infinite memory for dedupe.

lisper 2 months ago | root | parent | next |

You apparently missed this part:

"Just for the sake of completeness I should point out that removing duplicates at the receiver is a pretty extreme oversimplification of what you would do in practice to provide exactly-once delivery. A complete solution would almost certainly be an example of Greenspun's Tenth Law applied to the TCP protocol rather than Common Lisp."

two_handfuls 2 months ago | root | parent |

Can you explain this point? I think you are trying to say the application would include an implementation of TCP? I don't see how this is related to memory.

lisper 2 months ago | root | parent |

I was responding to this:

> You can't get exactly once delivery using proposed solution in general case because you can not have infinite memory for dedupe.

That is a specific criticism of a specific and highly oversimplified algorithm for de-duplication that I described purely for illustrative purposes, not as a suggestion for how de-duplication should actually be implemented. Actual implementation of de-duplication is much more complicated, but a solved problem that I didn't think I needed to rehash. A real implementation doesn't require infinite memory.

(Actually, even the naive algorithm doesn't require infinite memory, just unbounded memory. Those are not the same.)

qaq 2 months ago | root | parent |

You certainly do need to rehash it or provide a link to it being solved without constraints on time window, msgs rates and msgs sizes. You also most certainly want to build it and provide it as service because no existing product claims this capability in unbounded case.

lisper 2 months ago | root | parent |

> solved without constraints on time window, msgs rates and msgs sizes

But that is not what I am claiming.

two_handfuls 2 months ago | root | parent |

The question was for the general case, so yes that is exactly what you are claiming in this thread.

Here is the question again: "Well pretty confused. You can't get exactly once delivery using proposed solution in general case because you can not have infinite memory for dedupe."

lisper 2 months ago | root | parent |

Only a naive implementation requires infinite memory. (And even then it's unbounded, not infinite.)

qaq 2 months ago | root | parent |

You are going in circles so you claim you do have a solution for general case. It would be interesting to learn about details of such solution beyond it has different constraints compared to naive solution.

two_handfuls 2 months ago | root | parent | prev |

Don't bother, it's clear we have a "doesn't fit in the margin" situation here.

lisper 2 months ago | root | parent |

It's true that a complete solution doesn't fit easily into an HN comment. But one possibility is to send only one message at a time, and keep re-sending it until you receive an acknowledgement. Then send the next message. On the receiving side, deliver the first instance of the multiple copies you receive and discard the rest. (Send acknowledgements for all received instances of course.) This guarantees not only exactly-once delivery with only constant storage required at the receiver, but also in-order delivery. It's not particularly efficient, and it's not what you would want to do in real life, but it would work.

two_handfuls 2 months ago | root | parent |

In the general case there is no bound on the number of clients.

Consider: When your client crashes, does it assume a new identity on restart? Because you didn't say that the sender saved its latest message in stable storage.

lisper 2 months ago | root | parent |

> In the general case there is no bound on the number of clients.

Sure. So?

> When your client crashes, does it assume a new identity on restart?

Um, no? Is that really a serious question? Do you think computers in the real world lose their identity when they restart?

> Because you didn't say that the sender saved its latest message in stable storage.

I left out a lot of details that I assumed would be obvious and taken for granted. I didn't say, for example, that the intended recipient would be attached to the message either, but obviously that must be the case if there is more than one possible recipient. There are a zillion little details like that which I elided. A complete treatment would probably turn into a book because I'd have to start talking about things like atomicity, mutual exclusion, databases and transactions. But those are all red herrings.

lisper 2 months ago | prev | next |

Author here. Most commenters still seem to be missing the point, despite the fact that I explicitly say this in the opening sentence:

"This post is ostensibly about an obscure technical issue in distributed systems, but it's really about human communications."

and reiterate it at the end:

"This post was intended to be about human communication more than distributed systems or network protocols."

I really don't know how I could have made this any clearer.

mlyle 2 months ago | root | parent | next |

The problem is, you're using strong language like "under any reasonable definition of 'delivery'." But everyone else is defining delivery differently than you, referring to the delivery of the message to the system itself. Your language implies everyone else is unreasonable.

When your argument depends upon everyone else being unreasonable, maybe you're the one being unreasonable.

Yes, we can make the processing that occurs in response to those delivered message(s) idempotent. But in the end, the system has to either deal with:

1. messages being delivered once or lost entirely, or

2. messages being delivered once or multiple times

You are over-explaining a way to deal with situation #2 (detect duplicates at the endpoint).

lisper 2 months ago | root | parent |

> referring to the delivery of the message to the system itself

And how do you define "the system itself"?

mlyle 2 months ago | root | parent |

The thing that is at the end of the lossy medium. It must tolerate (0 or 1) or (1 or more) things being delivered to it.

lisper 2 months ago | root | parent |

Yes, that is true. But why can't I choose to view "the system itself" as the thing that is on the other side of a de-duplicator?

It feels to me like an argument over whether or not humans can fly. An unassisted human cannot fly, but with some technological augmentation, they can. It seems a bit pedantic to deny that someone can fly from LA to New York simply because they have to get into an airplane to do it.

srkiNZ 2 months ago | root | parent | next |

> why can't I choose to view "the system itself" as the thing that is on the other side of a de-duplicator?

because the "de-duplicator" would either:

* be somewhere else on an unreliable network (in which case we have the same problem)

* be on the same machine (or in the same process) as "the system itself" (in which case from a distributed systems perspective makes it the same thing)

> It seems a bit pedantic...

It is pedantic. The only reason that these "delivery" rules are popular is because of how many times programmers have gotten it wrong. Mostly by making assumptions that either:

* the network is reliable

* the message queue (or whatever) will de-duplicate messages for me

mlyle 2 months ago | root | parent | prev |

Having a clear system boundary is required for analysis.

Knowing that messages will be delivered 1+ times gives us a variety of ways we could choose to deal with this on the endpoint, with different vulnerabilities. (Getting "exactly once" processing usually requires making various kinds of resilience tradeoffs based on timing windows, storage requirements, etc).

> It seems a bit pedantic to deny that someone can fly from LA to New York

At this point I question your good faith. You're calling people out by name, and you're going full on "well, aktuallyyyy" and seeming to deliberately misunderstand other peoples' assertions. "People can't breathe underwater" v. "Well, once I was in a tunnel that was under a body of water, and I still breathed!(@!("

If you choose to define words differently than everyone else, you're just sabotaging your own communication to try and feel smart.

lisper 2 months ago | root | parent |

> You're calling people out by name

I am? Where?

> Getting "exactly once" processing usually requires making various kinds of resilience tradeoffs based on timing windows, storage requirements, etc

Yes, of course. But that's not the same as "impossible".

mlyle 2 months ago | root | parent |

> I was reading Hacker News a few days ago and stumbled on a comment posted by ...

lisper 2 months ago | root | parent |

Really? That is what causes you to question whether or not I'm acting in good faith?

If that's what you call "calling people out by name" I guess we'll just have to agree to disagree.

mlyle 2 months ago | root | parent |

The whole refusal to accept that a field could legitimately define something differently than how you prefer, and then running off to blog about it and name names... and then coming around for round II of flamewar... with ever more splitting of hairs in definitions... is not awesome.

You are especially well-answered here, I think: https://news.ycombinator.com/item?id=41599131

One reason the delivery / processing distinction exists because very often the application needs to atomically persist "I have received this message" with any other state changes made as a result of processing that message for correctness. You can't generally solve this with a layer put on top, even on the same machine. If it's not atomic, then you can still deliver duplicates to the application or end up never delivering to the application. (Power goes out when one side has written but not the other).

So, the state change to "already received" and the changes you want to make in response to the message being received have to happen together. TCP or even a message queueing implementation with a persistence layer cannot solve this problem for you. Thus, the application needs to deal with multiple delivery.

Imagine a "subtract $5 from my bank account" message with no ID on the message itself, and a layer "on top" that gives IDs and tries to ensure exactly once delivery. If the layer "on top" does not change state at the exact time $5 is deducted from the account, bad things can happen-- and in practice this is impossible. Hence, the application needs to be able to cope with the "subtract $5" being delivered to it multiple times, and this deduping has to be intimately tied to it subtracting the $5 (processing).

lisper 2 months ago | root | parent |

> You are especially well-answered here

I don't see anything there that is at odds with anything I have said. All I see there is a restatement of my position.

> One reason the delivery / processing distinction exists because very often the application needs to atomically...

Yes. Do you really think I did not already know that?

> Thus, the application needs to deal with multiple delivery.

That depends on your requirements. What does that have to do with the possibility or impossibility of exactly-once delivery?

> Imagine a "subtract $5 from my bank account" message with no ID on the message itself

I have never denied that you can invent scenarios that will fail. I explicitly said that exactly-once delivery is likely not what you want. What does that have to do with whether or not it is possible?

mlyle 2 months ago | root | parent |

> Yes. Do you really think I did not already know that?

Well, if the application and this mythical higher-level thing have to do things atomically and be tightly wed, but you're insistent on calling them different entities so that you can win an internet argument that the second one is not getting duplicate "deliveries" ... then that's honestly kind of sad.

The literature has used the term "delivery" like this basically 100% of the time for the past 20 years, and the majority of the time somewhere else. You can argue that your definition makes sense to you, but when everyone else uses the term the other way it's not helpful. Anyone can choose to define words differently from everyone else and then try to lawyer it out, but it's not likely to be useful or accepted.

lisper 2 months ago | root | parent |

> you're insistent on calling them different entities so that you can win an internet argument

No, I'm insistent on calling them different entities because in actual practice they can be, and indeed usually are, different entities. De-duping is usually done in the operating system, and applications usually run in user space.

mlyle 2 months ago | root | parent |

No, as we're saying repeatedly at this point-- the application itself needs to tolerate multiple delivery, because if the deduplication isn't atomic with the application's actions, incorrect behavior results. Stacking on top of TCP doesn't fix this.

lisper 2 months ago | root | parent |

> if the deduplication isn't atomic with the application's actions, incorrect behavior results

So? What do the application requirements have to do with the question of whether or not exactly-once delivery is possible? The application is a red herring. Why do you keep bringing it up?

If you want to argue that exactly-once delivery is generally undesirable, that is not in dispute. What is in dispute is whether or not it is possible, and the application requirements cannot possibly have any bearing on that.

Jach 2 months ago | root | parent | prev | next |

This has been a fun thread to read, though not reflecting highly on HN. For what it's worth, I agree with you that the original adage kicking this off is kind of silly, and basically wrong. For further enjoyment I found this interesting blog post from another PhD that addresses things more comprehensively (and also basically agrees with you): https://www.mydistributed.systems/2021/10/exactly-once-deliv...

The opening lines include: "The exact definition, however, is not agreed upon in the community. As a result, there is a debate on whether EOD is possible or impossible to achieve." If nothing else, I and probably others learned today that this is apparently a debate that can quickly turn into a flamewar. And I thought flamewars were mostly dead!

Another interesting paper that came up, as I have an interest in TLA+ proofs: "LogPlayer: Fault-tolerant Exactly-once Delivery using gRPC Asynchronous Streaming" https://arxiv.org/abs/1911.11286 It seems there's no problem in the community to do things like prove fault-tolerant exactly-once delivery, even if such terminology isn't universally agreed on.

pyrolistical 2 months ago | root | parent | prev | next |

Don’t use technical terms like “exactly-once delivery” if you don’t have want it to be interrupted as technical

lisper 2 months ago | root | parent |

It is hard to make the point that "exactly-once delivery" is not a technical term without referring to it. If you think it is a technical term, would you kindly point me to a definition? I'm particularly interested in learning how "exactly-once delivery" is distinguished from "exactly-once processing".

mlyle 2 months ago | root | parent | next |

A 4 year old piece laying out the exact difference as it's understood and how people use the terms:

https://blog.bulloak.io/post/20200917-the-impossibility-of-e...

I read similar content at least 2 decades ago...

> While exactly-once-delivery is not possible, we have a way out: Exactly-once processing. Exactly-once processing is the guarantee that even though we may receive a message multiple times, in the end, we observe the effects of a single processing operation. This can be achieved in two ways:

> Deduplication: dropping messages if they are received more than once

> Idempotent processing: applying messages more than once has precisely the same effect as applying it exactly once

(I view deduplication as a special case of idempotency).

lisper 2 months ago | root | parent |

> A 4 year old piece laying out the exact difference...

"Exactly-once delivery guarantee is the guarantee that a message can be delivered to a recipient once, and only once."

That seems circular to me.

Also, the author's proof is flawed. The 2GP requires more than exactly-once delivery, it requires common knowledge. It is not enough for the first general to know that the message will be received, it is required that the first general knows that the message has been received, and that the second general knows that the first general knows this, and that the first general knows that the second general knows... and so on.

mlyle 2 months ago | root | parent |

The piece makes a very simple distinction.

Delivery is the property of a message showing up at a receiver, irrespective of the receiver making state changes.

Processing is making state changes.

You can't dedupe messages without some kind of state change. Your guards, writing down that a given message has been here before, have been delivered the message. An endpoint on a lossy medium has to cope with either (0 or 1) or (1 or many) messages.

Now, can the guards deliver it to you at most once? Well, if there's no lossy medium between, sure. But we already know that we can deliver exactly once when the medium is perfectly reliable. The guards have "processed" the message for this purpose, and the fact that they can deliver it to you over a perfectly reliable channel is moot.

The distinction is between the characteristics of the channel (delivery) versus what the receiver must do to achieve appropriate processing properties.

These might come from some combination of intrinsic idempotency, timers, persisting past messages to disks, establishing a message ordering, etc, etc, etc. These are the mechanisms that you need to cope with "one or many" delivery, and they all shape the state model of your system with respect to messages.

lisper 2 months ago | root | parent |

> The guards have "processed" the message for this purpose, and the fact that they can deliver it to you over a perfectly reliable channel is moot.

No, it isn't, because the situation inside the fort is different than the situation outside. The odds of a courier being intercepted inside the fort are effectively zero. If your computational model includes a non-zero probability of failure "inside the fort" then you are no longer in the realm of distributed systems, but are now talking about fault-tolerant computing, which is a whole nuther kettle o worms.

pyrolistical a month ago | root | parent | next |

No, one of the key ideas in distributed systems is the network is unreliable. It’s not bad actors “inside the fort” it’s just that messages get lost sometimes.

Internal switches can fail. There are countless reasons why packets will get lost

mlyle 2 months ago | root | parent | prev |

No. An element of a distributed system that wants to debit from bank accounts must acknowledge the receipt of a message after making its effects durable. That system may fail between the change being durable and the acknowledgment making it back to the original sender.

That element must tolerate that debit being delivered multiple times. And we can't even solve that problem by a higher-order system providing "reliable delivery" on that element, unless that thing persists atomically with the application performing the transaction (or the application itself tolerates multiple delivery).

lisper 2 months ago | root | parent |

> An element of a distributed system that wants to debit from bank accounts must acknowledge the receipt of a message after making its effects durable.

I do not and never have disputed that. But I fail to see what that has to do with the matter at hand, namely, whether or not exactly-once delivery is possible.

dragonwriter 2 months ago | root | parent | prev |

Draw a diagram containing a source system, a destination system, and an unreliable communication channel between them. The destination system also has an output with no unreliable communications channel.

Exactly-once delivery means that a message sent from the source system reaches the destination system exactly once, and its result reaches the output channel exactly once as a consequence.

Excactly-once processing means that a message sent from the source system produces the expected output from the destination system once, even though it may be received by the destination system more than once.

(That's a little sloppy because it could use more discussion of the conditions in which it won't be received zero times, and how those are different between exactly-once and at-most-once delivery, but that's mostly beside the point because it isn't part of the distinction between exactly-once delivery and exactly-once processing. And, while definitely technical, they always involved a somewhat idealized view of the destination system, because all communications channels, including those internal to a single device, have some degree of unreliability.)

lisper 2 months ago | root | parent |

> That's a little sloppy

Yes, that is exactly my point. The only way you can make it non-sloppy is to define "delivery" as being something that happens exclusively upstream of deduping.

dragonwriter 2 months ago | root | parent |

No, I'm saying its "sloppy" as a definition because while it addresses the distinction you ask about it, it doesn't fully cover what distinguishes exactly-once from at-most-once.

> The only way you can make it non-sloppy is to define "delivery" as being something that happens exclusively upstream of deduping.

"Deduping" can happen in many places. If it happens anywhere before the destination system end of the unreliable connection it is part of delivery (but also can't get you to exactly-once delivery). If it happens on the destination side of the unreliable communication channel, then yes, it's not part of the delivery guarantee, it is how you get exactly-once processing from at-least-once delivery. This has been well-known for a very long time. (I don't think it was new when I first encountered it in 1999.)

tacitusarc 2 months ago | root | parent | prev | next |

Right, but the meaning of “delivery” in human communication is still different from how you are using it…

lisper 2 months ago | root | parent |

I disagree. If I'm an actual general in a fort with a gate, and I tell the gate guards to inspect messages brought in by couriers and not allow couriers carrying duplicate messages to enter, the ones that the guards let through are still delivering those messages to me.

noaoh 2 months ago | root | parent | prev | next |

I had a similar thought as what's in your post, but didn't share because I thought the reaction would be something like this. I personally would word this as you can achieve exactly once semantics by combining at least once delivery with idempotency.

stackghost 2 months ago | root | parent | prev | next |

Your proposed definition of "delivery" is absurd.

If you have duplicate things, then you've clearly been delivered more than one thing. There is no way to deliver something exactly once, and yet the receiver has more than one thing such that they can throw all but one thing away.

It's okay to admit you were mistaken.

lisper 2 months ago | root | parent |

> If you have duplicate things, then you've clearly been delivered more than one thing.

Yes, that's true. But this doesn't turn on what "delivery" means, it turns on what "you" means. If "you" are downstream of a de-duplication mechanism, then "you" can get exactly once-delivery. Why is that so absurd?

stackghost 2 months ago | root | parent | next |

>Yes, that's true. But this doesn't turn on what "delivery" means, it turns on what "you" means. If "you" are downstream of a de-duplication mechanism, then "you" can get exactly once-delivery. Why is that so absurd?

So in the case of, say, a network service on server A and a network client B, your solution to "exactly once delivery" is to re-define it as "deliver it from A to B multiple times and have B deduplicate"?

Do you not see how nonsensical that is to call that "exactly once delivery"?

plorkyeran 2 months ago | root | parent | prev | next |

If you have a reliable connection between “you” and the deduplicator, then “you” aren’t receiving messages over an unreliable connection at all and so the claim that you can’t have exactly once delivery over an unreliable connection isn’t applicable in the first place. You’re receiving messages over a reliable connection and what happens upstream of that is irrelevant.

srkiNZ 2 months ago | root | parent | prev | next |

If "you" are "downstream" of a "de-duplication mechanism" how do you ensure "exactly once delivery" between "you" and the "de-duplication mechanism"?

lisper 2 months ago | root | parent |

The same way I ensure any behavior in a digital system. There are boundaries inside of which processes are presumed to be reliable, typically consisting of a CPU, memory busses, and attached non-volatile storage. If you don't assume that those are reliable, then you can't guarantee anything.

srkiNZ 2 months ago | root | parent |

Great! I agree 100%. We have to assume a "reliable network" within a "boundary" (i.e. a computers CPU, memory, busses etc...). Distributed systems (from which these rules are taken) are specifically systems where anything within one of these "boundaries" is considered a "single node" and treated the same, whether it's a NIC, a kernel module/driver, a user space process or anything else.

In our case if we were to take (for example) that the NIC would de-duplicate the messages for us, anyone writing the producer/sender and a user-space program for the receiver a would need to know that the NIC was doing this (or risk having messages dropped for failure of including a unique id).

This is a pedantic point, but I would strongly stress that the only reason these "delivery rules" are so popular (and evoke such a reaction) is because of the very large number of times that programmers mis-understand them.

Commonly they either assume that:

* the network is reliable

* something else will guarantee "exactly once delivery" for me (when in fact nothing will)

computerfan494 2 months ago | root | parent | prev |

You're correct, but in my experience the vast majority of code written is not downstream of a usable de-duplication mechanism.

lisper 2 months ago | root | parent |

That may well be, but that's a very different question.

computerfan494 2 months ago | root | parent |

It explains why most people seem to disagree with you on what "delivery" and "you" mean in this context. For the majority of contexts, "delivery" means that a system responsible for de-duplication receives the message.

lisper 2 months ago | root | parent |

Yeah, but that just seems like a bizarre definition on which to base the claim that you cannot have exactly-once delivery. Obviously, if you define delivery to preclude de-duplication, then you can't have exactly-once delivery. But you can have something that delivers messages (for some reasonable definition of "delivers") exactly once. It seems weird to define delivery in such a way that such a system does not provide exactly-once delivery.

computerfan494 2 months ago | root | parent |

I'm not sure how to convince you to get over your hangup around how other people feel about the word "delivery", but this is exactly why others distinguish between "delivery" and "processing". If you think there are better terms than those two to describe "the system that receives a message and must be responsible for de-duplication" and "the system that can rely on messages being already de-duplicated", then feel free to propose them and have people debate, that I suppose. But because of what I noted earlier, this is a very useful distinction to maintain for people working on systems that are responsible for de-duplication (likely most people on this forum), and these words seem to make sense for most individuals.

lisper 2 months ago | root | parent |

I don't have any hangups about how other people feel about anything. My "hangup" is that I see no evidence that there is a consensus on a technical definition of the word "delivery". In the absence of such a definition, there is no basis for asserting that exactly-once delivery is impossible, particularly when there are existence proofs to the contrary based on reasonable informal definitions of the word "delivery".

brigadier132 2 months ago | prev |

It seems like this person does not understand what exactly once delivery actually means. It's not about being able to do something exactly once if the author is reading this...

lisper 2 months ago | root | parent |

> It seems like this person does not understand what exactly once delivery actually means.

Then by all means, enlighten me.

brigadier132 2 months ago | root | parent |

I don't want to get into a debate over semantics with you. You've redefined exactly once delivery to mean that if you have a layer of abstraction that hides that a message was delivered multiple times then it is "exactly once" delivery. That's not what exactly once delivery is. You clearly disagree but taking well understood and researched terms and redefining them to make a point is a complete waste of time for everyone. Nobody cares about what your personal definition of exactly once delivery is, they care about the definition that is used in literally all of distributed systems research.

lisper 2 months ago | root | parent |

I'm not asking for a debate. I'm simply asking you to tell me what "exactly once delivery" actually means in a way that isn't circular and doesn't beg the question of whether or not it is possible, i.e. the definition that is used in literally all of distributed systems research. So far no one has been able to do that.

Filligree 2 months ago | root | parent | next |

Delivery is what happens when the signals cross from the wire into the network interface card.

Depending on the layer you’re operating on, you might instead say it’s the call to recv, or the DMA transfer. The point is that it’s logically a memcpy with no further processing. Just a memcpy. The physical data transfer.

It’s easy to build a reliable message-passing system on top of that, but any such system will either involve further processing or else be vulnerable to data loss.

lisper 2 months ago | root | parent |

> Delivery is what happens when the signals cross from the wire into the network interface card.

If you look at this sibling comment [1] you will find someone who disagrees with your definition.

[1] https://news.ycombinator.com/item?id=41601562

On your definition, yes, exactly-once delivery is not possible. But I think your definition is neither reasonable nor authoritative.

srkiNZ 2 months ago | root | parent |

FWIW, I consider that definition to be both reasonable and (in the context of "distributed systems") authoritative.

brigadier132 2 months ago | root | parent | prev |

Exactly once delivery is when a message is delivered to any computer at any layer of abstraction exactly once.

It doesn't matter if TCP abstracts or any other layer abstracts this away, if a message can be delivered more than once it does not have the property of exactly once delivery.

You might disagree with this definition but you would be wrong. There is negative value in redefining exactly once delivery to mean what you think it means.

lisper 2 months ago | root | parent |

But I don't disagree with that definition. The only thing in dispute is whether or not it is possible to achieve. It seems clear to me that it is possible, but for some reason that I still don't understand people keep insisting that it isn't.