NixCon - Linus's blog

I went to NixCon last week. A number of things were discussed, but a talk by Théophane Hufschmitt in particular was relevant to my project, as he related the results of a research internship where he designed and implemented a type system for the Nix language. This post is a bit of a brain-dump on the thoughts this gave me as far as my project is concerned.

The constraints on the type system he designed were quite different from the relatively free-form nature of my project: in particular, the type system he designed needed to be able to accept expressions that can not necessarily be typed accurately, but behave well for the purpose of evaluation. Gradual typing, which he used to deal with this issue, is an interesting concept but may not be so relevant to my project, as it is more of a compromise made in order to accommodate existing code while retrofitting a type system to an established language (like in Hufschmitt’s project, or TypeScript) than a useful concept to incorporate into a (mostly) fresh language design. Gradual typing becoming necessary in my project would suggest that I’ve failed to design a sufficiently versatile type system!

Another interesting element of Hufschmitt’s type system, however, is the use of set-theoretic types. A system with algebraic data types permits forming new types from existing ones through the use of products and sums. Hufschmitt pointed me to “Semantic subtyping: dealing set-theoretically with function, union, intersection and negation types” by Frisch, Castagna and Benzaken as an introduction to the subject. A cursory reading of the introduction suggests that I’ll need to take a little more time to digest it. I also don’t quite understand if the same benefits wouldn’t be covered by a dependent type system.

The gradual typing part also put a crazy idea into my head where rather than simply inferring the type of a binding, the actual type definitions could be inferred that is, refining existing code with more detailed typing. Mad raving about this idea follows.

What comes to mind in particular is the use of status codes in C. Many procedures which may fail return an integer status code, or store such a status code in a particular location, and the meaning of that status code will vary significantly, particular between code from different vendors/libraries, and by the set of status codes that a particular procedure may return. A tool like corrode, which transpiles C into Rust, could be combined with such a “type definition inference” system to increase the expressiveness of the resulting code.

For instance, the stat() procedure will return -1 and set errno, an integer, in accordance with why the call failed. However, it is specified that errno may have one of a fixed set of values after a failed call to stat: EACCES, EBADF, EFAULT, ELOOP, ENAMETOOLONG, ENOENT, ENOMEM, ENOTDIR, EOVERFLOW. This is a far smaller range of values than the entire range of possible integers, and would be helpful to encode in the type of a translated implementation of stat, so that more reasoning can be done about callers of stat; for instance, comparing errno to EISDIR while handling the failure case of a stat could be marked as dead code. This sort of thing is of course already possible with various static analysis frameworks, but encoding information like this into the type system makes it easier to integrate with development tools.

I’m not sure if that’s a task I trust myself to implement (or which would even be particularly relevant) for GHOTL, but I’ll make sure to look into whether any literature on this concept exists! Maybe Bob can help me with that?