(2023-04-19) Human-scale programming languages and the problem with them ------------------------------------------------------------------------ I started writing this post while looking at the source code of Equi ([1]), my probably most ambitious stack-based VM to this day that isn't _fully_ esoteric as it allows to write compact but still human-readable machine code that ideally would even work on Apple IIe. I remember I promised to write more about this VM, but not today. Today, I just want to mention the equi.c file now has 716 SLOC of pure ANSI C89 code. Is this a lot? Well, compared to most modern programming language implementations, it might not seem a lot (even busybox's awk.c is currently about 2900 SLOC), but 700 SLOC already is around the upper limit of my comprehension. Of course I understand this code because I wrote and tested it, and I hope everyone else will understand it because it is well-structured and well-commented, but still, it's just too much. And the sad truth is, nothing else can be taken away from there without sacrificing either compatibility or usability. I don't want this VM to grow in size, but the only realistic way to further shrink it would be dropping multitasking support and returning to the old(-ish) memory structure which I spent so much time to move away from. And yet again, this would reduce the codebase by around 100 SLOC at most and wouldn't fundamentally change the overall picture of things. From this point of view, it's interesting to analyze various programming languages and their particular dialects or implementations that are usually presented as "minimal". For instance, regardless of how small Lua, Red, Boron and MicroPython are, I wouldn't consider them "minimal" because their codebase still is huge. As I have already mentioned, busybox awk doesn't look minimal either. Well, what does? There seem to be just three major language families that do not belong to esoteric or narrow-spec (like dc) classes that, although not at all small in their canonical implementations, _can_ have really minimal flavors: Forth, Lisp and Tcl. I say "families" because the implementations themselves may be so different one couldn't recognize the original concept in them. For instance, both MINT and my Equi are Forth-like although neither of them fully qualifies as a variant of Forth. Apart from these three families, there also are some long-forgotten specimens like Tiny BASIC (world's first piece of software to popularize the word "copyleft", by the way) and VTL/VTL-2, with the canonical implementation of the latter being famous for being able to fit into 768 _bytes_ of Altair's ROM. And, as some advanced versions of this language are still being developed for 6502- and Z80-based machines, with the latest Apple IIe compatible variant (VTLC02) having 644 SLOC of the **assembly** language and fitting into 962 bytes of machine code, this continues to be a textbook example of true programming minimalism. Now, here is the main problem that is true for all minimal programming languages to larger or lesser extent: the simpler you make the core interpreter, the harder you make programming in it. As someone said, complexity has to live somewhere. If it's not in your interpreter, then it either resides at the lower level (OS, VM or even the Web browser runtime if your language targets such an environment), or the upper level (the standard library, as it usually is the case with Forths and Lisps), or you put all the extra burden on the programmers themselves and every one of them has to reinvent the same wheels. To me, the main challenge in picking or even designing such a language would not be in moving the complexity around but eliminating it altogether. How, may you ask? After all, it's the tools we can change, not the tasks we must do with them... Well, here are three recommendations I could give about complexity reduction. 1. Adjust your requirements. This is much easier to do if it happens before even picking the tools. Think on the lines of "Do I, or the tools I choose, really need to be able to do X in order for me to do Y?" Don't be afraid to cut off unnecessary requirements with the Occam's razor. 2. Decompose your tasks into a set of smaller ones and only pick the tools necessary to do each part, not anything extra. A good example would be typesetting software: you could use all-in-one packages like Kile, Scribus or even some proprietary monstrosities from Adobe I even forgot the names of, or you could use something modular like troff with eqn, pic, bib, dformat etc., but only the parts you actually need. If you only need to use formulas in your documents, you don't need anything except eqn with troff/groff/nroff. If you also need images, you add in pic, and so on. Although they both perform the same tasks, guess which approach is less complex? The second one. Same with software design, as well as programming language design itself. I always was amazed how Plan 9, that never gained any serious traction, was far more Unix-way than the actual Unix-like OSes that did. 3. Don't assume growth. This is what I already wrote about in my DevOps related rant: most of the complexity in the software world arises completely prematurely from the blind assumption that everything that starts small will grow large. Only focus on what you need to do right here and now. When your code needs to grow, refactor it accordingly. When, not before. Accordingly, not beyond the scale. Now, how do these recommendations and the thoughts before them translate into my vision of truly minimal programming languages? Well, there must be some kind of "lowest common denominator" both in terms of implementation complexity and in terms of usage complexity, as well as self-sufficiency. So, here are my criteria. To me, a particular implementation of a programming language is minimal if all of the following conditions apply: 1. Its full source, along with the standard library, must not exceed 500 SLOC of well-formatted and readable ANSI C89 code. If the implementation is provided in another programming language, the SLOC count of a hypothetical ANSI C89 translation replicating identical behavior of the language must be estimated. If the implementation provides a VM and a compiler is used to compile the code for this VM, then both the VM's and the compiler's source code is counted. 2. The implementation must provide I/O. If it targets the platforms that support standard input, it must support standard input too. If it targets the platforms that support standard output, it must support standard output too. If it only targets the platforms that have neither, it must provide a way to return the computation results without having to use any kind of debugger, tracer or monitor. 3. The source code in the language itself must be human-readable and only consist of printable characters except whitespaces or tabs. Also, any whitespace characters used in the code must not differ semantically, i.e. a single whitespace 0x20, a Tab character 0x9 or any combination of them must serve as a single delimiter or bear no semantics at all. An exception could be Python-like languages where the amount of leading whitespace characters on each line is significant, but that must be clearly stated in the language specification. 4. The language in this particular implementation must be Turing-complete. This might not be so obvious from the first glance, so it's better to explicitly specify this requirement. Now, I understand that languages like Brainfuck will also meet all these criteria. Well, yes, Brainfuck is cryptic but still minimal. Its full implementation in C89 can fit into well below 500 SLOC, it provides standard I/O and its source code is human-readable. Whether or not you can understand it is another question for another discussion. But, on the scale of complexity, I'd put BF far lower than anything like modern Java. At least I can imagine how I could even integrate BF programs into my Unix pipelines for daily routines. With Java, I'm not so sure. About 10 years ago (if not 15 already), I had read a quote by some anonymous that reflects the overall situation described in this post pretty accurately: "If everyone out there knew bash, find, vi(m), grep, sed and awk, millions of software products would never need to be created". Only fairly recently I started understanding how damn right he was. --- Luxferre --- [1]: https://git.sr.ht/~luxferre/Equi