To Module Or Not, That Is The Question

jakehamilton · May 14, 2024, 8:47pm

I have been thinking about this for the last few weeks and the idea recently came up in a discussion surrounding our package infrastructure so I figured that I would start a topic here for it.

Right now we are mostly following similar strategies to NixPkgs for handling package creation. These are taken care of by function calls and the makeOverridable helpers. These solutions do work, but have certain drawbacks:

Hard to document
Hard to explain
No type information
Odd/overlapping naming
Awkward solutions like the optional finalAttrs pattern

However, there is another strategy we could choose. One similar to drv-parts: GitHub - DavHau/drv-parts: Configure packages like NixOS systems

Instead of function calls abstracting derivation we could instead use the module system to organize package creation. This solution has some benefits:

Built in documentation support
Still difficult, but is now identical to NixOS management
Type checking & merging
Naming can be simplified along with easily supporting nested options
No more finalAttrs

However, this pattern has some challenges we will need to overcome:

Evaluation speed of many modules.
How do you fully replace a phase?
How do you prepend/append a phase?
How do you remove dependencies of a package definition?
How do you replace dependencies of a package definition?
A good few others…

However, this would also give us the flexibility to allow for dynamic builders, structured environment data, and more. The challenges here should be dealt with and unknowns need to be discovered before we can commit to something like this.

srxl · May 15, 2024, 6:30am

It’s been bought up over in this thread - might be worth crossing into there?

versed · May 15, 2024, 8:45am

See also this gist.
If modules for packages fixes the horrendous experience with overriding packages, then I’m all for it.
That said, this is my opinion mostly as a user of nix* and not someone who often deals with package definitions.

aria · May 15, 2024, 11:40am

Presumably we could have each phase be a nested submodule, for example for bootstrapping stdenv there could be stdenv.stage0, stdenv.stage1, …, stdenv.final
Each submodule has .prev attribute that it draws from, and if users want to add a stage then they can override the next stage’s prev to point to their own module (I’m not actually certain if the current module system supports ‘pointers’ though).
Since stdenv has a varying number of stages per host/target platform, this would probably be more like stdenv.<host>.<target>.stage*, with stdenv.final pointing to the final stage of one of those nested modules.

It seems like the plan (at least in edolstra’s talk/gist) was to make this a language feature to get around this. Probably that’s not possible in the near future, but maybe we should plan around migrating to one later.

Also worth noting that drv-parts / dream2nix / pkgs-modules seems to expect you to instantiate nixpkgs again whenever you use a different variant of a package (ie each time you’d call python3.withPackages). This seems probably bad. For example:

let
  pkgs = import <nixpkgs> {
    config.packages.python3.packages = {
      pandas.enable = true;
      pandas.version = "0.19.1";
      pandas.src.sha256 = "08blshqj9zj1wyjhhw3kl2vas75vhhicvv72flvf1z3jvapgw295";
    };
  };

in
pkgs.python3

would probably cause problems if you then reused pkgs to get other attributes which don’t want that particular pandas version.

Jeff · May 15, 2024, 5:40pm

I’m not as familiar with modules. I dont see much about phases in the docs so could you expand upon that? @jakehamilton

jakehamilton · May 15, 2024, 6:05pm

Sorry for the confusion, I was referring to package build phases like patchPhase, buildPhase, and installPhase.

Jeff · May 15, 2024, 8:30pm

Is that integrated into modules? I thought that was just mkDerivation

jakehamilton · May 15, 2024, 9:40pm

Correct, we would need to figure out how to handle this with modules if we were to go down that path.

Jeff · May 16, 2024, 1:28am

Sorry I’m still a bit confused. Are you saying phases are currently handled by nix modules and we should do the same or that phases are currently handled by mkDerivation and that we need to move away from that and have it be handled by modules?

jakehamilton · May 16, 2024, 1:54am

Phases are a part of the builder, stdenv. If we were to configure things through modules we would need to figure out a solution for managing these phases. Replacement, prepending commands, etc.

aria · May 16, 2024, 1:29pm

Presumably mkDerivation would also be a module, which has attributes for buildPhase, etc.
The actual default output of a package would be the .drv attribute of a module, or .dev .man for other ones.

Jeff · May 16, 2024, 2:56pm

So some of my derivations/flakes don’t use stdenv or mkDerivation. (I had an issue with mkDerivation causing a binary to be completely broken, I couldn’t figure out which phase was causing it, asked on the discourse got no response, and eventually just used builtins.derivation and patched headers / dynamic libs myself)

Is just using builtins.derivation directly going to be impossible under the module system?

jakehamilton · May 16, 2024, 3:45pm

No, in fact it would be easier than it currently is. The builder for packages would be able to be customized by setting something like:

packages.bat.builder = config.builders.derivation;

Jeff · May 16, 2024, 4:46pm

Thats good to hear that its possible.

Maybe this is just a terminology thing, but I thought the builder was an executable, like bash. And I thought it was an argument given to builtins.derivation.

Is there some other builder terminology I’m confusing with the derivation builder?
Is packages.bat.builder like nixpkgs.bat.builder?
Is nixpkgs.bat like a nix module or is it a derivation?
Are we going to kind of deprecate mkDerivation in favor of just using this new module-integrated phases approach?

Jeff · May 17, 2024, 2:38pm

In the meantime, there’s a few other things with modules I think are worth talking about.

Imports and mkOption (from nix modules) are great to the point that I’d consider them necessary in a re-do of nixpkgs. That said, I’m not as sold using nixos modules as-is.

For a design I think we should mention the tasks we want to accomplish, and then see how we can craft a module structure to fit those tasks. For example, it would be great if I could hear examples of tasks you were trying to accomplish when overriding phases.

I’ve got a few tasks for indexing/discoverability and detangling. But I’ll have to defer to other people’s input for tasks related to services, NixOS system stuff, and hydra.

Here’s things I think packages/modules should handle:

For making a searchable index, we should know (without building a module):
- What config options are available
- - What values they are allowed to be (enum values, boolean, positive numbers, package inputs)
- What maintained versions of the package are available
- (Boolean) Does the module have a test suite
- What systems (OS triples) passed the test suite in CI
- Definite system requirements for building (if the maintainer added any)
- Dependency constraints (e.g. node needing to be version 10, if the maintainer added any such constraints)
- Default dependencies (flake locks do this for flakes)
- Is / is-not cached (can’t be just a boolean, will need to be a structure specifying “cached for this set of config options and system”)
- Which dependencies are build-time only
- A name that is broad enough for substitutions (ex: “coreutils” could be satisfied by gnu coreutils or rust coreutils)
- What licenses does the package use
- (Boolean) Foss/non-foss flag
- Maintainers (with options for names, email, site+username)
- URLs for source code
- Hashes of code (there are different hash funcs but also differences in filesystem or compressed format (zip, tar) )
- Known cve’s
- What binaries are part of the output
- (Maybe) commits for previously-maintained versions (ex: { “v3.5.0” = “[commit-hash]”; } )
- build requires internet (true, false, or null=unknown)
- What specifications the package provides (ex: “shell:posix”, “v3.5.0”, “feature:python-minimal”,)
- It would be nice to have an optional icon url
- (Maybe) env stuff
- - Any services it would like to set or depend on
  - Any ENV vars it would like to modify
  - Shell hooks it would like to modify (bashrc extensions)
At install time of a module, some users tasks are:
- Easy way to set config options
- Easy way to list config options
- Easy way to list direct dependencies
- An easy way to override direct dependencies
- A way to override any level of the dependency tree (see Deno’s import map system!)
- An obvious/easy way for users to report package specific issues. Like a “report issue” button on the aux.search.org.

Beyond that though, there’s an important question that seems easy but can actually be really hard. What is “the same module” vs a different module?

What can make this hard is, for example, redis (the database), changed licenses. So, if we want a license to be an attribute of a module, then modules would need it to be a specific version (or subset of versions, e.g. versions before the license change) instead all the of versions. Another “what is a module” question is; can a module have dynamic dependencies? (I think, with a NixOS module import list, the answer is “no”, which IMO is good for many tasks)

Towards answering that “what’s a module” question, I’d like to propose a smaller and larger definition. I’ll call the larger one tools and the smaller one modules – with tools being user-facing, and modules focusing more on being well-defined little packets that are faster to eval. Tools could contain multiple modules under this definition.

To make that idea more concrete, Modules could be defined as

Single set of dependencies (on other modules)
Has one active set of licenses/description/cve list (doesn’t contain two versions of a package that have a different license and/or description and/or CVE list)
Has a static list of versions it can build
Has some form of “If you’re looking for coreutils, I am one implementation of coreutils” (e.g. a certificate system for solving the shared inputs problem)
Has a derivation function (that doesn’t necessarily use mkDerivation)

For eval time (for indexing/search tasks), I think it would be hard to do better than serializing as much module info as possible. Either to json/toml, or for real performance: convert the json to bson. Compared to high performance parsers that integrate directly into databases, full Nix eval is just heavy and slow. Letting the nix runtime focus purely on building derivations instead of repeatedly generating static info at runtime would almost certainly be runtime-optimal. Nix can also import json/toml so its not like the info is outside of nix’s grasp. We might need two files: pre-build info (version, name, licence, etc) and post-build (are tests passing, is there a cached version available, CVE list, names of contributed binaries, output hash values, etc).

However, with one license, one cve list, etc modules would necessarily end up fragmented (Like python38, python39, etc). That isn’t great for user experience. So “Tools” could be for when a user is looking for something: when they search python they get the “python” tool, not a fragmented list of python2, python38, python39, pythonMinimal, etc. I think it would be a much better experience to simply select python, select the version, and be presented with config options for that version, than to see a million different variants of python at the top level of search results. We could store info such as description, icon, and website homepage at the tool level instead of the module/package level.