37:25So Beelay, has a requirement that it
has to work with, encrypted chunks.
37:30So, you know, we do this compression
and then encryption, on top of it,
37:34and then send that to the Sync Server.
37:36The Sync Server can see, because it
has to know who it can send these
37:39chunks around to, the membership.
37:41So Sync Server does have
access to the membership.
37:44of each doc, but not the
content of the document.
37:47so if you make a request, it checks,
you know, okay, are you somebody
37:50that, has the, the rights to, to
have this sent to you, yes or no,
37:53and then it'll send it to you or not.
37:55And this isn't only for sync servers,
you know, if you connect to somebody,
37:58you know, directly over Bluetooth, you
know, you'd do the same thing, right?
38:01Even if, you know, you
can both see the document.
38:04There's nothing special
here about sync servers.
38:06To do this sync, well, we're no
longer syncing individual ops, right?
38:10Like, we could do that, but
then we lose the compression.
38:13It's not great, right?
38:15And ideally, we don't want people to
know, you know, if somebody were to
38:19break into your server, hey, here's how
everything's related to each other, right?
38:22Like, that compression and
encryption, you know, also hides
38:25a little bit more of this data.
38:27We do show the links between these,
you know, compressed chunks, but
38:30we'll, we'll get to that in a second.
38:32Essentially what we want to do is chunk
up the documents in such a way where,
38:38there's the fewest number of chunks to
get synced, and the longer ranges that
38:43we have of, you Automerge ops that we get
compressed before we encrypt it, right?
38:48On the, I'll call it client.
38:50It's not really a client in
a local-first setting, right?
38:52But like not on the not sync server
when you're sending it to it.
38:55the more stuff that you have,
the better the compression is.
38:58And chunking up the document here
means basically, you're really
39:02chunking up the history of operations
that then get internally rolled up
39:07into one snapshot of the document.
39:09And that could be very long.
39:11And, there's room for optimization.
39:14That is like the, the compression here,
where if you set a ton of times, like,
39:19Hey, the name of the document is Peter.
39:22And later you say like, no, it's Brooke.
39:24And later you say, no, it's Peter.
39:26No, it's Johannes.
39:28Then you, you can like compress it into,
for example, just the latest operation.
39:33Yeah, exactly.
39:34So, you know, if you want to think about
how this, you know, to get, to get more
39:37concrete, you know, if you take this
slider all the way to one end and you take
39:40the entire history and run length encoded,
you know, do this Automerge compression,
39:45you get very, very good compression.
39:47If we take it to the far other
end, we go really granular.
39:50Every op, doesn't get compressed, but you
know, so it's just like each individual
39:55op, so you don't get compression.
39:56So there's something in between
here of like, how can we chop up
39:59the history in a way where I get
a nice balance between these two?
40:04When Automerge receives new ops, It has
to know where in the history to place it.
40:10So you have this partial order,
you know, you have this, you
40:12know, typical CRDT lattice.
40:14And then, we put that, or it
puts it into a strict order.
40:18It orders all the events and
then plays over them like a log.
40:21And this new event that you get,
maybe it becomes the first event.
40:24Like you could go way to the
beginning of history, right?
40:26Like you, you don't know because
everything's eventually consistent.
40:29So if you do that linearization
first and then chop up the documents,
40:34you have this problem where.
40:36If I do this chunking, or you do this
chunking, well, it really depends
40:39on what history we have, right?
40:41And so it makes it very, very difficult
to have a small amount of redundancy.
40:46So we found, two techniques
helped us with this.
40:49One was, we take some particular,
operation as a head and we
40:55say, ignore everything else.
40:56Only give me the history
for this operation.
40:58Only instruct ancestors.
41:00So even if there's something concurrent,
forget about all of that stuff.
41:04So that gets us something stable
relative to a certain head.
41:08And then to know where the
chunk boundaries are, we
41:13run a hash hardness metric.
41:15So, the number of zeros at the
end of the hash for each op, gives
41:20you, you know, you can basically
say, you know, each individual op,
41:23there may or may not be a 0, 0, 0, so
I'm, I'm happy with, with anything.
41:28Or if I want it to be a range of, you
know, 4, then give me two 0s at the
41:32end, because that will be, you know, 2
to the power of 2 is 4, so I'll chunk
41:35it up into 2s, and you, you make this
as big or as small as you want, right?
41:38So now you have some way of
probabilistically chunking up the
41:41documents, relative to some head.
41:44And you can say how big you want that to
be based on this hash hardness metric.
41:47the advantage of this is even if
we're doing things relative to
41:51different heads, now we're going to
hit the same boundaries for these
41:54different, hash hardness metrics.
41:56So now we're sharing how we're
chunking up the document.
41:59And we, Assume that on average,
not all the time, but like on
42:04average, older, operations will
have been seen by more people.
42:08So, or, you know, more and more peers.
42:11So, you're going to be appending things
really to the end of the document, right?
42:17So you, you will less frequently
have something concurrent with the
42:20first operation using this system.
42:22That means that we can get really
good compression on older operations.
42:28Let's take, I'm just picking numbers
out of the air here, but let's take
42:30the first two thirds of the document,
which are relatively stable, compress
42:34those, we get really good compression.
42:36And then encrypt it and
send it to the server.
42:38And then for the next, you know, of
the remaining third, let's take the
42:42first two thirds of that and compress
them and send them to the server.
42:46And then at some point we
get each individual op.
42:48This means that as the, the
document grows and changes.
42:52We can take these smaller chunks and as
that gets pushed further and further into
42:56history, we can, whoever can actually
read them, can recompress those ranges.
43:02So, Alex has this, I think, really
fantastic, name for this, which is
43:06sedimen-tree because it's almost acting in
sedimen-tree layers, but it's sedimen-tree
43:12because you get a tree of these layers.
43:14Yeah, it's cute, right?
43:15and so if you want to do a sync,
like let's say you're doing a sync
43:18of like completely fresh, you've
never seen the document before.
43:21You will get the really big chunk,
and then you'll move up a layer,
43:25and you'll get the next biggest
chunk of history, and then you move
43:27up a layer, and then eventually
get like the last couple of ops.
43:30So we can get you really good
compression, but again, it's this
43:32balance of the these two forces.
43:35Or, if you've already seen the
first half of the document, you
43:38never have to sync that chunk again.
43:39You only need to get these higher
layers of the sedimentary sync.
43:44So that's how we chunk up the document.
43:46Additionally, and I'm not at all
going to go into how this thing works,
43:49but if people are into sync systems,
this is like a pretty cool paper.
43:53It's called Practically Rateless Set
Reconciliation is the name of the paper.
43:57And it does really interesting things
with, compressing how, all the information
44:02you need to know what the other side has.
44:04So in half a round trip, so in one
direction on average, you can get all
44:09the information you need to know what
the delta is between your two sets.
44:13Literally, what are, what's the handful
of ops that we've diverged by without
44:18having to send all of the hashes?
44:20so if people are into that
stuff, go check out that paper.
44:22It's pretty cool.
44:23but there's a lot of detail in
there that we're not, we're not
44:25going to cover on this podcast.
44:26Thanks a lot for explaining.
44:29I suppose it's like, Just a tip of
the iceberg of like how Beelay works,
44:33but I think it's important to get a
feeling for like, this is a new world
44:37in a way where it's decentralized,
it is encrypted, et cetera.
44:42There's like really hard constraints what
certain things can do since you could
44:47say like in your traditional development
mindset, you would just say like, yeah,
44:52let's treat the client like it's just
like a, like a Kindle, with like no
44:56CPU in it let's have the server do as
much as the heavy lifting as possible.
45:01I think that's like a, the
muscle that we're used to so far.
45:04But in this case, the server, even if it
has a super beefy machine, et cetera, it
45:11can't really do that because it doesn't
have access to do all of this work.
45:15So the clients need to do it.
45:17And, and when the clients
independently do so, They need to
45:21eventually end up in the same spot.
45:23Otherwise the entire system, falls
over or it gets very inefficient.
45:27So that sounds like a really
elegant system that, that you're
45:30like working on in that regard.
45:32So with Beehive overall, like
again, you're starting out here with
45:38Automerge as the driving system that
drives the requirements, et cetera.
45:43But I think your, bigger ambition
here, your bigger goals, is that this
45:48actually becomes a system that is,
that at some point goes beyond just
45:54applying to Automerge, and that being a
system that applies to many more other
45:59local-first technologies in the space.
46:01If there are application framework authors
or like, like other people building a
46:07sync system, et cetera, and they'd be
interested in seeing like, Hmm, instead
46:11of like us trying to come up with our
own, research here for like what it
46:17means to do, authentication authorization
for our sync system, particularly if
46:23you're doing it in a decentralized way.
46:25What would be a good way for those
frameworks, those technologies to
46:30jump on the, the Beehive wagon.
46:33so if they're already using
Automerge, I think that'll be
46:37pretty straightforward, right?
46:38You'll have bindings, it'll just work.
46:40but Beehive doesn't have a hard
dependency on Automerge at all.
46:45because it lives at this layer below and
we, Early on, we're like, well, should
46:50we just weld it directly into Automerge?
46:51Or like, you know, how much does
it really need to know about it?
46:55and where we landed on this was
you just need to have some kind
46:58of way of saying, here's the
partial order between these events.
47:02and then everything works.
47:04So, as, just as a intuition.
47:07You could put Git inside of, Beehive,
and it would work, I don't think
47:11GitHub's gonna adopt this anytime
soon, but like, if you had your own
47:14Git syncing system, like, you, you
could do this, and, and it would work.
47:18you just need to have some way of
ordering, events next to each other.
47:22and yes, then you have to get a little
bit more into slightly lower level APIs.
47:27So I, when I build stuff, I tend to
work in layers of like, here's the very
47:32low level primitives, and then here's
a slightly higher level, and a slightly
47:35higher level, and a slightly lower level.
47:37so people using it from Automerge
will just have add member, remove
47:40member, and like, everything works.
47:41to go down one layer, you have to wire
into it, here's how to do ordering.
47:47And that's it.
47:48And then everything else should,
should wire all the way through.
47:51And you have to be able to
pass it, serialized bytes.
47:53So, like, Beehive doesn't know anything
about this compression that we were
47:56just talking about that Automerge does.
47:58But you tell it, hey, this is, you
know, this is some batch, this is
48:02some, like, archive that I want to do.
48:03It starts at this timestamp
and ends at that timestamp,
48:06or, you know, logical clock.
48:07please encrypt this for me.
48:09And it goes, sure, here you go.
48:10Encrypted.
48:11And, you know, off it goes.
48:12So it has very, very few, assumptions
48:15That's certainly something that I might
also pick up a bit further down the
48:18road myself for, for LiveStore where
the underlaying substrate to sync data
48:23around is like a ordered event log.
48:26And, if I'm encrypting those events.
48:29then I think that fulfills, perfectly
the requirements that you've listed,
48:34which are very few for, for Beehive.
48:37So I'm really looking forward
to once that gets further along.
48:40So speaking of like, where
is Beehive right now?
48:43I've seen the, lab notebooks from what
you have been working on at Ink & Switch.
48:49can I get my hands on
Beehive already right now?
48:52Where is it at?
48:54what are the plans for the coming years?
48:56So at the time that we're recording
this, at least, which is in early
48:59December, there's unfortunately not,
not a publicly available version of it.
49:02I really hoped we'd have it ready by now,
but, unfortunately we're still, wrapping
49:06up the last few, items in, in there.
49:09but, Q1, we plan to have, a release.
49:12as I mentioned before, there are some
changes required, to Automerge to consume.
49:16specifically to, to
manage revocation history.
49:19So somebody got kicked out, but we're
still in this eventually consistent world.
49:23Automerge needs to know
how to manage that.
49:24But.
49:25Managing things, sync, encryption,
all of that stuff, we, we hope to have
49:30in, I'm not going to commit, commit
the team to any particular, timeframe
49:33here, but like, we'll, we'll say in
the next few, in the next coming weeks.
49:37right now the team is, myself.
49:39John Mumm, who joined a couple months
into the project, and has been working
49:43on, BeeKEM, focused primarily on BeeKEM,
which is a, again, I'm just going to
49:48throw out words here for people that
are interested in this stuff, related to
49:51TreeKEM, but we made a concurrent, Which
is based on, MLS or one of the primitives
49:55for, for messaging layer security.
49:57he's been doing great work there.
49:58And, Alex, amongst the many, many
things that Alex Good does between
50:02writing the sync system and maintaining
Automerge and all of these, you
50:07know, community stuff that he does,
has also been, lending a hand.
50:11So I'm sure there's like for,
for Beehive in a way you're, Just
50:15scratching the surface and there's
probably enough work here for, to
50:19fill like another few years, maybe
even decades worth of ambitious work.
50:24Can you paint a picture of like, what
are some of like the, like right now
50:28you're probably working through the kind
of POC or just the table stakes things.
50:33What are some of like the, way
more ambitious longterm things
50:36that you would like to see in
under the umbrella of Beehive?
50:39Yeah.
50:40So, There's a few.
50:41Yes.
50:42and we have this running list internally
of like, what would a V2 look like?
50:45So, one is, adding a
little policy language.
50:48I think it's just like the, bang
for the buck that you get on having
50:51something like UCAN's policy language.
50:53It's just so high.
50:54It just gives you so much flexibility.
50:56hiding the membership, from even
the sync server, is possible.
51:00it's just requires more engineering.
51:02so there are many, many places in
here where, zero knowledge proofs, I
51:06think, would be very, Useful, for, for
people who knows, know what those are.
51:09essentially it would let the sync
server say, yes, I can send you bytes
51:14without knowing anything about you.
51:16Right,
51:17but it would still deny others.
51:19And right now it basically needs
to run more logic to actually
51:22enforce those auth rules.
51:25Yeah.
51:25So today you have to, sign a message
that says, I signed this with the same
51:30private key that you know about the
public key for in this membership, we
51:36can hide the entire membership from
the sync server and still do this.
51:39Without revealing even who's
making the request, right?
51:41Like, that would be awesome.
51:43in fact, and this is a bit of a
tangent, I think there's a number
51:45of places where, that class of
technology would be really helpful.
51:49Even for things like, in CRDTs,
there's this challenge where you have
51:53to keep all the history for all time.
51:55and I think with zero knowledge proofs,
we can actually, like, this, this would
51:58very much be a research project, but I, I
think it's possible to delete history, but
52:02still maintain cryptographic proofs, that
things were done correctly and compress
52:06that down to, you know, a couple bytes,
basically, but that's a bit of a tangent.
52:10I would love to work on that at some
point in the future, but for, for
52:13Beehive, yeah, hiding more metadata,
Hiding, you know, the membership
52:17from, from the group, making it,
all the signatures post quantum.
52:21that is like even the main,
recommendations from, from NIST, the U.
52:26S.
52:26government agency that that handles
these things only just came out.
52:30So, you know, we're still kind of waiting
for good libraries on it and, you know,
52:34all, all of this stuff and what have you.
52:36But yeah, making it post quantum, or
fully, big chunks of it are already
52:40post quantum, but making it fully
post quantum, would, would be great.
52:43and then yeah, adding all kinds of, bells
and whistles and features, you know,
52:46making it faster, adding, it's not going
to have its own compression, because it
52:50relies so heavily on cryptography, So
it doesn't compress super well, right?
52:54So we're going to need to figure
out our own version of, you know,
52:58Automerge has run length encoding.
52:59What is our version of that, given
that we can't run length encode
53:02easily, encrypted things, right?
53:04Or, or signatures or, you
know, all, all of this.
53:06so there's a lot of stuff,
down, down in the plumbing.
53:08Plus I think this policy language
would be really, really helpful.
53:11That sounds awesome.
53:12Both in terms of new features,
capabilities, no pun intended, being
53:16added here, but also in terms of just,
removing overhead from the system and like
53:22simplifying the surface area by doing,
more of like clever work internally,
53:27which simplifies the system overall.
53:29That sounds very intriguing.
53:31The, the other thing worth noting with
this, just, I think both to show point
53:35away into the future and then also
draw a boundary over where what Beehive
53:39does and doesn't do, is identity.
53:41so Beehive only knows about public
keys because those are universal.
53:46They work everywhere.
53:47They don't require a naming
system, any of this stuff.
53:50we have lots of ideas and opinions
on how to do a naming system.
53:55but you know, if, if you look at,
for example, uh, BlueSky, under
53:58the hood, all of the accounts are
managed with public keys, and then
54:02you map a name to them using DNS.
54:04So either you're using, you know, myname.
54:07bluesky.
54:07social, or you have your own
domain name like I'm expede.Wtf
54:12on BlueSky, for example, right?
54:13Because I own that domain name
and I can edit the text record.
54:15and that's great and it, definitely
gives users a lot of agency over
54:20how to name themselves, right?
54:21Or, you know, there are
other related systems.
54:24But it's not local-first
because it relies on DNS.
54:28So, like, how could I invite you to a
group without having to know your public
54:32key, We're probably going to ship,
I would say, just because it's like
54:35relatively easy to do, a system called
Edge Names, based on pet names, where
54:40basically I say, here's my contact book.
54:42I invited you.
54:43And at the time I
invited you, I named you.
54:45Johannes right?
54:46And I named Peter, Peter, and so on and
so forth, but there's no way to prove
54:52that that's just my name for them.
54:54Right.
54:54And for these people, and having
a more universal system where
54:59I could invite somebody by like
their email address, for example, I
55:02think would be really interesting.
55:03Back at Fission, Blaine Cook.
55:06Who's also done a bunch of stuff with
Ink & Switch in the past, had proposed
55:09this system, the NameName system,
that would give you local-first names
55:12that were rooted in things like email,
so you could invite somebody with
55:17their email address and A local-first
system could validate that that person
55:21actually had control over that email.
55:23It was a very interesting system.
55:25So there's a lot of work to be done in
identity as separate from, authorization.
55:29Right, yeah.
55:30I feel like there just always, There's
so much interesting stuff happening
55:35across the entire spectrum from, like,
the world that we're currently in,
55:40which is mostly centralized, for just
in terms of, like, that things work at
55:45all, and even there, it's hard to keep
things up to date and, like, working,
55:50et cetera, but we want to aim higher.
55:54And one way to improve things a lot is
like by going more decentralized but
55:59there's like so many hard problems to
tame and like, we're starting to just peel
56:04off like the layers from the onion here.
56:07And, Automerge I think is a, is a great,
canonical case study there, like it has
56:12started with the data and now things
are around, authorization, et cetera.
56:17And like, then authentication,
identity there, we probably have
56:21enough research work ahead of us
for, for the coming decades to come.
56:25And super, super cool to see that so
many bright minds are working on it.
56:29maybe one last question
in regards to Beehive.
56:34When there's a lot of cryptography
involved, that also means there's
56:38even more CPU cycles that need
to be spent to make stuff work.
56:43have you been looking into some,
performance benchmarks, when you, let's
56:48say you want to synchronize a certain,
history of Automerge for some Automerge
56:54documents, with Beehive disabled and
with Beehive enabled, do you see like
57:00a certain factor of like how much it
gets slower with, Beehive and sort of
57:05the authorization rules applied both
on the client as well as on the server?