Binary cache - thoughts

marshmallow · May 5, 2024, 1:51am

Even without a OSS sponsorship cloudflare is probably one of the best options

imadnyc · May 5, 2024, 1:54am

Is it worth reaching out to the people at vpsfree.cz and see if they’re able to accommodate us? They’re all for OSS and a non-profit.

Also, I just realized, could they be the answer to our git hosting woes?

jakehamilton · May 5, 2024, 2:21am

Yes it is and I do not believe there are any plans to open source it.

jakehamilton · May 5, 2024, 2:24am

It was indeed, Graham offered in good faith. He really likes a lot of what Aux is doing (SIGs, cli improvement, etc).

jakehamilton · May 5, 2024, 2:25am

We won’t be vendor locked. Shifting the cache to elsewhere is a matter of finances mostly.

srtcd424 · May 5, 2024, 2:58am

Ooh, I didn’t know about this, that looks absolutely like what is needed to handle that part of the problem!

dfh · May 5, 2024, 4:21am

Agreed. I was dreaming earlier how a build farm (and maybe cache) could be run on a similar technology like the Golem Network with some form of build output comparison (reputation) mechanism. I didn’t know trustix is already a thing…

getchoo · May 5, 2024, 6:04am

this is what i was thinking as well. the only real risk of “vendor lock-in” with caches is potentially losing archived data

i would trust those at detsys/flakehub would give us a decent heads up on any changes to our arrangement though, as i’m sure many of them are familiar with how large of an effect that can have on a project (see the funding situation for the foundation’s cache last year)

tiredbun · May 5, 2024, 6:53am

DetSys’ cache thing as a temporary solution is surprising but I don’t see any technical drawbacks to that, given that the worst they can do is to back down on their offer at inconvenient time.

But given that the most inconvenient is to never offer that, I think it’s safe to assume good faith or at least that their intentions are related to building reputation, in both cases it will benefit Aux more than it could possibly harm it.

srtcd424 · May 5, 2024, 7:48am

I’ve given the docs, repo and blog post a very quick uncaffeinated skim and nothing about project status leapt out at me. Do you happen to know what, if anything, is considered still to do?

jakehamilton · May 5, 2024, 7:50am

Work on Trustix stopped quite some time ago, it never got off the ground and is now sitting in stasis.

srtcd424 · May 5, 2024, 4:13pm

Is it worth a mod splitting this topic? I was originally bouncing around some fairly specific technical ideas in an informal way, but there’s a fair amount of general policy discussion here now from people likely to end up on the infra committee (I’ve not volunteered for that as i don’t think my health or experience are up to it)

On a related note, should the infra committee category be reserved for “official” discussions? As a non member just pottering around doing my own experiments and research I don’t want to muddy the SNR - quite happy to lurk in a more general area

isabel · May 5, 2024, 4:20pm

I personally think everything here stayed pretty on topic. With a small exception relating to the policies but I think its better left here for the context.

srtcd424 · May 5, 2024, 5:39pm

Sadly, it turns out it’s not the directory size causing Tahoe’s ludicrous cpu usage - it seems to be inherent. I could just about make it acceptable for single-user usage by turning off attic’s chunking altogether, but that doesn’t seem desirable and anyway I still don’t think it will scale.

So, giving up on Tahoe for now, and playing with Garage instead, which is looking a lot more promising. Definitely seems to be more performant, more modern, simpler, though it’s a more conventional approach to clustering so likely only applicable to small co-operating groups with a fair amount of mutual trust.

Interestingly, it seems attic is scalable, which is something I didn’t realise. Makes it more widely applicable for a frontend.

srtcd424 · May 5, 2024, 8:01pm

Just digging around in the Garage source and it seems to use the protocol from the Scuttlebutt project for rpc which is quite interesting. If nothing else the crypto looks good and means VPNs / overlay networks etc are probably not needed for connecting peers.

Heliobri · May 6, 2024, 3:17am

I would like for us to have nothing to do with Determinate Systems, for political and PR reasons. The Anduril debacle is not fully resolved.

srtcd424 · May 6, 2024, 1:53pm

So I got one Garage node up, pushed some data to it, then added another node - and it rebalanced the data. Exciting times! Now to see if I can get it back out

aprl · May 6, 2024, 3:27pm

I can only recommend attic even tho its not 100% perfect. But I have to say taking a professional offer like something from flakehub sure sounds promising.

Edit: I would like us to not use it tho, because detsys is the reason forks like this happened in the first place…

OtherBookmarks · May 6, 2024, 5:10pm

There was a discussion about a P2P binary cache for nix. From what I’m understanding there are two components when searching for a binary package

A request to centralised server to get the hash of the package using the .drv derivation hash
Content-addressed request to the p2p service either in-process or using an external binary

In order to have a P2P binary cache one would thus need

build server(s) to
- build derivations
- hash their output
substituter to store mapping between derivation input and derivation output
optionally storage to store the derivation output
a nix binary that know how to communicate with a P2P binary cache

A possible collab with Lix was brought up and with their help, one could possibly get this into lix faster than anybody ever could get it merged into nix.

Since it’s P2P, I think there are two options for the build servers:

centralised build servers aka one source of trust →
- first of public derivation output is served by central storage
- copies of derivation output are then served by replicators (people willing to serve the binaries)
centralised build orchestrator →
- derivations are built by trusted, participating nodes
- drv output hash is sent back to orchestrator and stored in substituter with some consensus algorithm (basically reproducible builds)
- participating nodes can also act as initial storage nodes or delegate to other storage nodes
- clients can immediately pull from distributed storage

The centralised build servers are probably the most expensive, but “easiest” to implement. The centralised build orchestrator might be new territory, but could be the highest possible reduction of code and allow relying 10s, 100s, or even 1000s of community build nodes + storage nodes. This is the rosy future I dream of - no more direct AWS, GCP, or Azure: JaBOCN - Just a Bunch Of Community Nodes.

shom · May 7, 2024, 5:22pm

This is analogous to getting Linux ISO torrents and seeding passively, it and makes a lot of sense to me. If storage is a bigger issue than bandwidth (is it?) then this doesn’t address that.

If storage is a bigger issue then a potential approach should be: older derivations can age out into cheaper cold storage, hashes remain available in fast “cache” and if the p2p network doesn’t have that hash+derivation it can be built locally or wait on slow egress. This is a very high level spitball idea of course, I don’t have an out of the box implementation to point to.