@coded@minion and me (@dfh) have been brainstorming over different ways to do secret management in Aux given the rather bad state in which the nix secrets management stuff currently is.
The problems we are trying to avoid with the new solution:
Secrets committed to git
Secrets can’t change without redeployment/ being part of system configuration
The need for an unencrypted secret on disc, typically SSH key
Secret versioning, rotation, attestation and auditing are quite hard
A big aspect for the project is to use declarative methods for secret generation where possible. The gokey utility is an inspiration for this idea, but does not resolve the issues for secrets that are pre-shared in nature (think API tokens, WiFi passwords, etc).
We wanna hear your thoughts and how you would like the solution to look and feel.
If you’re into this topic, please help us out. You can join The Matrix Room or DM us on the discourse
Secrets can’t change without redeployment/being part of system configuration
This makes it no longer reproducible, as the secrets are versioned separately. I don’t think it’s a problem for e.g. API tokens as they are inherently not reproducible (you can’t rollback to an old configuration with a revoked API token), but if you mess up your password you might want to rollback to a previous config.
The need for an unencrypted secret on disc, typically SSH key
This is less of a problem in personal deployments (e.g. PC or home server, where physical access is not much of a threat) and it is much simpler, so I’d like to still have this as an option for simpler/lower security stuff.
Secret versioning, rotation, attestation and auditing are quite hard
Agree.
My conclusion is that you recommend versioning them separately which makes sense (you’d want to use newer passwords/API tokens where possible) and this would allow us to address most other problems (plain text secret on disk, secrets on git, etc.), but it might interfere with rollbacks.
I think agenix/sops is fine for home deployments and easier to manage, but for more professional deployments, having something like a TPM backed password store would be nice.
What appears to fulfill your requirements would be something like pass that would use the system’s TPM instead (systemd-creds might work?), then deployment would go like:
decrypt secrets on build host
encrypt them with target’s ssh/age pubkey
transfer them to target
target re-encrypts them with the TPM key and stores them in a new store.
on test/switch, the TPM decrypts the secrets to /run/secrets and the services can access them (or through systemd-creds).
The secret store on the build host would be managed/versioned separately, but it should support push to a remote host through e.g. ssh so that the secrets can be updated on the fly. It might need to run a script that updates the secrets/reloads the services running to apply the updates though.
The above text was shared by me last night in the matrix server, I modified it slightly before posting it here to fix some mistakes and better represent my opinion.
After further discussions I believe we share a similar opinion and are leaning towards a specific design:
We would like to build a “vault” API where the secrets are stored in order to be shared among many machines/versions of machines.
Each “vault” would provide its own script that allows retrieving secrets from the “vault” itself (We lean towards defaulting to a gokey wrapper with some extra functionality).
The secrets would be retrieved and stored on a local store (backed by, e.g. systemd-creds), the store would optionally (but not by default) use the system’s TPM device, and a local secret (e.g. the system’s host ssh key).
The local store would then provide the secrets to the system services.
Example workflows
Initial deployment
The vault is copied over alongside the system configuration
The vault is used to provision the local store
The local store is used to provide the secrets to the services
The system is all setup
Configuration changes not affecting secrets
e.g. disable existing service that does not rely on secrets
Redeploy the system
Nothing to do, neither the local store, nor the vault changed
Configuration changes affecting secrets
e.g. enabling/disabling TPM support in the local store
Copy over new configuration
Re-provision the local store from the vault
Restart affected services & switch to configuration
Vault changes
e.g. API key rollover, password change, etc.
Copy over the new vault
Re-provision local store
Restart affected services
Scripts/extensions
Work that needs to be done to support this solution.
Extend gokey to store fixed secrets
Similarly to pass, create a store folder that contains secrets that cannot be generated based on the gokey seed (e.g. API keys). Keeping it simple, we’ll use the filename to derive a symmetric encryption key and retrieve the secret inside the file.
This will mean the folder will need to be copied over along with the gokey seed file, but it can be versioned separately from the config (e.g. through git). Or along with the config if your setup doesn’t mind that.
Extend nixos-rebuild to copy over the vault/provision the local store
Each vault should define a copy script, this script should either directly copy the vault to the target system, or it should provide a list of pairs of source and destination paths that nixos-rebuild should copy.
Extend the test/switch script to provision the local store
When running nixos-rebuild test/switch/install the local store needs to be provisioned based on the vault’s interface.
Extend modules to accept secrets from the local store
Finally, integrate the local store with the NixOS module system.
y’all mighta well known this, but reading this i remembered that hashicorp did a library literally named Vault. since their license change it was forked, but in other words, for what’s it’s worth there is stuff out there that interfaces with various existing services for secrets.
to what extent that could be useful to interact with in this context i’m not sure. i think nix restricted network access at certain stages to reduce impurity, tho iirc Vault did in fact work in terms of unlock → use → relock. so maybe bridging with the likes of that at least could help offload logic on interacting with other systems, for in as far as that might become desirable here.
I really like the idea of extensibility through an api/scripts, storing the secrets on the machine plain will likely be compatible with most consumption patterns.
I’m having success using sops and sops-nix for secret management on NixOS and nix-darwin (with home-manager).
It lacks (or perhaps, just my usage of it, lacks) systemd-creds (and thus TPM support) at the moment but is otherwise very robust.
Likely everyone involved has already audited this option, if so ignore my post; otherwise if you’re curious about the full workflow reply and I’ll go into more detail
The biggest issue with sops-nix/agenix is that they tie your secrets configuration to your system configuration, this means that if you rollback the configuration, you also rollback the secrets reverting any changes to API keys, passwords, etc.
Our belief is that passwords and secrets should be stored separately from the system configuration.