How are you handling automatic updates?

I’ve been having some issues getting automatic updates working with my Flake-enabled repo. I’ve seen system.autoUpgrade, but I think I may have misconfigured it. Does anyone have an example of a config that works for you? Other methods/approaches are welcome too.

1 Like

I personally find that auto upgrade just sucks, it almost never works. So instead I have ci update and deploy my flakes via deploy-rs.

4 Likes

Honestly just having my system using unstable, running nix flake update time to time

2 Likes

I think the key thing to keep in mind is that the purpose and use of autoUpgrade is (or should be) a little different for channels vs flakes.

In a channel-based system, the upgrade bumps the channel (which is external, untracked state) and rebuilds the system. The convenient way to use them is just accept all upgrades, but of course there is risk — mitigated by the ability to roll back system generations.

With flakes, the source revisions of inputs is tracked and committed to git. If you go looking for system.autoUpgrade to work the same way and inherently run nix flake update in some location like /etc/nixos, you’re going to get in trouble, especially when you use the same flake across a bunch of different machines (which is an important reason to want auto upgrades in the first place).

What works best for me is to update the inputs and commit the flake separately and publish that to a repo that’s accessible to all the clients. Then autoUpgrade on the clients is just responsible for following the flake, not updating the inputs.

The update of inputs can be done a number of ways. I personally just update my main workstation, and have all the other systems follow along usually overnight. You could also run the update in a CI job, pre-build the system flakes for all hosts, and even run some custom tests before publishing a new ‘blessed’ revision, and then the actual updates are quick on the hosts if the builds are sent to a binary cache.

It’s not so different in practice to the channel-based mechanism, since there are effectively tests run upstream before the channel advances. You’re just adding your own iteration of this, and it’s important because it is often over a whole collection of flake inputs, not just the nixpkgs channel. And because of all this I now no longer ever worry about my config having to be compatible with multilple nixpkgs (or other) versions across breaking changes… I just make the change to adapt in the same revision that updates the input.

Here’s what I use:

{ inputs, ... }: {
  system.autoUpgrade = {
    enable = ((inputs.self.rev or "dirty") != "dirty");
    flake = "git+ssh://soft-serve:23231/geek/nixos?ref=flake";
    flags = [ "--refresh" ];
    randomizedDelaySec = "45m";
  };
  systemd.services.nixos-upgrade = {
    preStart = "ssh soft-serve -p 23231 info";
    startLimitIntervalSec = 120;
    startLimitBurst = 6;
    serviceConfig = {
      Restart = "on-failure";
      RestartSec = "20";
      CPUSchedulingPolicy = "idle";
      IOSchedulingClass = "idle";
    };
  };
}

some notes:

  • the conditional enable means that if the running system has some local changes (quick hacks, or sometimes a workaround for a temporary issue), autoUpgrade won’t accidentally revert those from the clean branch.
  • the preStart and Restart loop is because nix has bad error handling
  • the scheduling stuff is to avoid a laptop coming back from suspend and slamming the cpu and disk with updates when you’re trying to quickly get something done
5 Likes

This is a super informative breakdown, thank you! I think I might do daily nix flake updates on my home server, commit the changes, then have the clients reference the host’s repo like you have in your example. I haven’t quite gotten CI down yet, but I’ve got Forgejo and Foregejo Actions set up, so hopefully that won’t be too difficult (famous last words).

I just can’t help but think about this from the perspective of a new NixOS user, who might be used to enabling automatic updates by running a single command, editing a single flag in a text file, or clicking a check box. Is there an easier/more accessible way to share a single Flake-enabled repo between multiple hosts? (I’m not asking you specifically, just posing this as a general question)


Update: This may be grossly over-engineered, but I took a stab at an “update & push → pull” system, and condensed it into a standard Nix module. All hosts have it enabled by default, and if you want to use a host as the “update and push” node, just set host.services.autoUpgrade.pushUpdates = true.

There’s probably a better way to do this, but I only have so long of a break :slight_smile:

1 Like

thanks for the write-up! I was pondering something similar recently since it changes to nature of updates to a pull pattern instead of a centralized deployment tool needing to push them.

Have you build any CI and/or pre-compiling and serving the updated derivations via a binary cache?

Because I have machines that are suspended or off-network much of the time (some of them get used only very rarely), the pull model suits my needs better too.

I don’t particularly bother with automated CI or dedicated binary cache. I get “close enough” via ambient activity that I haven’t needed to take the extra (small) steps.

The CI part is mostly about tests to ensure that the revision being pushed is good. In the above, I talk about updating inputs and config, and pushing the flake to a repo for clients to pull. I typically do this on my main workstation, because, well, it’s what I’m using and what I can easily test on and where I probably want updates first. It builds probably 75~80% of the stuff I use on other machines too, especially in terms of package size and build time.

After building and switching the “test system”, I look to see what changed (this uses the fact that booted sorts before current; you could also look at the result path from a build before switching):

❯ nix store diff-closures /run/*-system

If it’s a small update (a few days of general package updates in -unstable) and everything seems ok, I generally just push the updated revision to the flake repo and forget about it. If it’s a bigger bump (especially after staging lands and half the world rebuilt), I probably want to at least verify that the other ~25% of my stuff builds. So, I build it:

❯ nix flake show --json | from json | get nixosConfigurations | columns
    | each { |host|
        nix build --out-link $"result-($host)" $".#nixosConfigurations.($host).config.system.build.toplevel"
      }

This little bit of nushell just builds each system and keeps a separate result link for each to stop it getting gc’d. If I had additional tests I wanted to run, I’d add them as flake checks and run them here too.

When I’m happy, I just push the result. Automated CI would just automate these steps, and maybe use a CI staging branch for updates.

Then the last bit is the binary cache. I don’t bother with a separate one. Instead, a number of machines share their stores (via nix.sshServe) and sign everything they build (with nix.settings.secret-key-files); those are configured as substituters ( ….trusted-substituters) for everyone, with trusted store signing keys (….trusted-public-keys). So when the client autoUpgrade runs, the system closure built above, and kept alive by the result-host symlink is in the store and can be copied by the client.

This is where a small fib of omission above becomes important. My workstation suspends overnight, and if it’s offline when someone else wants to update, it hangs and makes everything terrible, as well as just not actually being useful. So, the “build all systems” step is usually run on an always-on server. I should probably even just automate that, but I haven’t — mostly because I want it before the branch gets pushed, even if it would still be useful for caching purposes after.

Unfortunately, the ssh store access mechanism is weirdly slow for some reason I haven’t looked into, to the point that it might be faster getting some things from public cache, but I mostly don’t care at 4am; the machines can just take their time.

What I really want is peerix to work, but it doesn’t. When some protocol work in lix is done, I / we have ideas for a better peer-to-peer cache sharing mechanism.

3 Likes

git diff --quiet && git diff --staged --quiet || git commit -am "Update flake.lock" && git push
Courtesy of build - How to let Jenkins git commit only if there are changes? - Stack Overflow

FWIW, I like using nix flake update --commit-lock-file for this:

  • it only commits the lock file, not other changes
  • it’s a no-op if there are no changes
  • you get a nice commit message with all the changes, matching the console output
flake.lock: Update

Flake lock file updates:

• Updated input 'home-manager':
    'github:nix-community/home-manager/10c7c219b7dae5795fb67f465a0d86cbe29f25fa' (2024-05-27)
  → 'github:nix-community/home-manager/1b589257f72c9c54e92d1d631e988e5346156736' (2024-05-29)
• Updated input 'lix':
    'https://git.lix.systems/api/v1/repos/lix-project/lix/archive/0b91a4b0ec79c27ee36d8a7e2afd7737cb825b65.tar.gz?narHash=sha256-WB7eZThfKCFbvZTasejmAaBOAVlopOGKg1fhSKYxA58%3D' (2024-05-27)
  → 'https://git.lix.systems/api/v1/repos/lix-project/lix/archive/c71f21da3ac4d95ef9a42a26416ccee71639dbd6.tar.gz?narHash=sha256-qzMZbPaT8j1a3%2Bs9eSoFk5o%2BwRt/m3zSO7VRwZopkfc%3D' (2024-05-29)
• Updated input 'lix-module':
    'https://git.lix.systems/api/v1/repos/lix-project/nixos-module/archive/12b457c433c8d0e81f614c894be34f5bb4c54f99.tar.gz?narHash=sha256-VA3jln13mvx62QcSIi9RCpy3wdJIWmvYKtW%2BsssgwIs%3D' (2024-05-27)
  → 'https://git.lix.systems/api/v1/repos/lix-project/nixos-module/archive/38f31ee7c1a60adae58833789dd855c128b056c6.tar.gz?narHash=sha256-dfNGs2AW/V31nMVeEBSUJCMfT6bZAKJ5qsWgFHWhvUc%3D' (2024-05-28)
• Updated input 'nix-vscode-extensions':
    'github:nix-community/nix-vscode-extensions/205f3c49a6cef35aeedc957914859ddff3019834' (2024-05-27)
  → 'github:nix-community/nix-vscode-extensions/35bae3cfd65aeb35b3866ecac6e70a60c8a9e8fd' (2024-05-29)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/bfb7a882678e518398ce9a31a881538679f6f092' (2024-05-24)
  → 'github:NixOS/nixpkgs/9ca3f649614213b2aaf5f1e16ec06952fe4c2632' (2024-05-27)

As noted previously, if this results in breaking changes that need config updates to match, I try to add them to the same revision via --amend, but I don’t always remember or find out in time; nbd.

3 Likes

@uep Thanks for the extensive write-up, highly appreciated :+1:

1 Like

This got me so worked up that I wrote a blog about it lol. I still think there has to be a better way, but I’m still looking for it :male_detective:

5 Likes