(2025-05-05) The anatomy of a tiny VM (and a blackjack game for it) ------------------------------------------------------------------- How small can a virtual machine get before dwindling into a fully esoteric state (like FlipJump, BytePusher, Subleq or even my own NRJ16)? In an attempt to answer this question, the mu808 VM has evolved into n808 ([1]) which, at the time being, already has a much more convenient assembly language but strictly limits both program and data memory areas to 128 steps and cells respectively, as well as reduces the ISA to just eight base opcodes and 24-bit instruction length. Nevertheless, I have been able to implement the same examples in this new VM just as successfully as in mu808, and the Bulls and Cows game, by the way, no longer is the most impressive of them all. But first, let's review how the n808 VM itself works in detail, looking at its TI-74 BASIC port where all of its inner secrets get revealed in the most obvious way. Let's see how we prepare the machine with the first line. The n808 specification has a random number generator and suggests that the trigonometric functions only accept radians as their arguments. That's why we write the instructions to initialize the RND and to switch the calculations to the radian mode: 100 RANDOMIZE:RAD Then, we initialize the randomly accessible storage areas for the program and data memory areas. Keep in mind that in this particular implementation we need to allocate one more step for the program memory, for the reason that will be obvious right away. 110 DIM PMEM(129),DMEM(128) Now, we're ready to load the n808 program. The only viable way to store the programs in TI-74 is DATA sections in its BASIC environment. So, we start reading the instruction values from the BASIC line 1000. But we also need to know when to stop, that's why the last DATA record has to be -1, and that's what we store in the 129th memory cell as the halt marker if the entire program memory is exhausted: 120 RESTORE 1000 130 I=0 140 READ PMEM(I) 150 I=I+1 160 IF PMEM(I-1)>-1 THEN 140 Once the program is loaded, it's time to fetch and decode the instructions. First, we initialize the counters and the memory value placeholders: 170 L,IAT,V1,V2,V3=0 The IAT variable is the "indirect addressing toggle" flag set by the instruction opcode 2. It will be used to detect which instruction parameters to use when the instruction is decoded. Now, the main execution loop starts: 180 A=PMEM(L):L=L+1 190 IF A<0 THEN 770 Here, we fetch the next instruction from the program memory and then increment the instruction pointer (you'll shortly see why this order is important). Whenever we have fetched the negative value (-1 in our case), we halt the program immediately. Now comes the decoding logic: 200 CMD=INT(A/2097152) 210 IF IAT=1 THEN 220 ELSE 230 220 P1=INT(V1):P2=INT(V2):P3=INT(V3):IAT=0:GOTO 250 230 A=A-CMD*2097152:P1=INT(A/16384):A=A-P1*16384 240 P2=INT(A/128):P3=A-P2*128 This logic directly comes from the formula to shape an instruction value in n808 from its four parameters: instruction = cmd * 2097152 + p1 * 16384 + p2 * 128 + p3. The only difference is that we don't need to derive the rest of parameters if the IAT flag is set, so we use the previously cached memory cell values as the current instruction parameters and then unset it. Otherwise, we derive the parameters according to the same formula. The clunkiness of the statements comes from the fact that the TI-74 BASIC doesn't have any modulo operation, so we decrease the fetched value by the integer factor of the previous one before calculating each parameter as the integer quotient. Before we begin command processing though, we need to set up the memory and populate the buffer memory values: 250 DMEM(0)=0:DMEM(127)=1:DMEM(126)=-1 260 V1=DMEM(P1):V2=DMEM(P2):V3=DMEM(P3) The n808 spec requires the memory cells 0, 126 and 127 to be read-only, and to return 0, -1 and 1 respectively. The line 250 enforces these values on every virtual CPU iteration. The next line caches the memory values from all three parameters, assuming the parameters are addresses themselves. This is to reduce the amount of code in every opcode branch. Now, the processing begins. And the first non-NOP instruction code, 1, is the most sophisticated one and describes the jump instruction: 270 IF CMD=1 THEN 280 ELSE 430 280 IF P1=0 AND V2=0 THEN L=P3:GOTO 180 290 IF P1=1 AND V2>0 THEN L=P3:GOTO 180 300 IF P1=2 AND V2<0 THEN L=P3:GOTO 180 310 IF P1=3 AND V2>=0 THEN L=P3:GOTO 180 320 IF P1=4 AND V2<=0 THEN L=P3:GOTO 180 330 IF P1=5 AND V2<>0 THEN L=P3:GOTO 180 340 IF P1=6 THEN L=P3:GOTO 180 350 IF P1=7 AND V2=0 THEN L=INT(V3):GOTO 180 360 IF P1=8 AND V2>0 THEN L=INT(V3):GOTO 180 370 IF P1=9 AND V2<0 THEN L=INT(V3):GOTO 180 380 IF P1=10 AND V2>=0 THEN L=INT(V3):GOTO 180 390 IF P1=11 AND V2<=0 THEN L=INT(V3):GOTO 180 400 IF P1=12 AND V2<>0 THEN L=INT(V3):GOTO 180 410 IF P1=13 THEN L=INT(V3):GOTO 180 420 IF P1=14 THEN DMEM(125)=L:L=P3:GOTO 180 This snippet fully replicates the jump logic from the specification. As soon as we hit the condition, we adjust the instruction pointer L and then go right back to the beginning of the processing loop. The line 420 demonstrates the benefit of adjusting the pointer right after parsing the instruction: we don't have to increment it here and assign the index of the next instruction to the data memory cell 125. The opcode 2, indirect addressing toggle command, on the other hand, is extremely simple to implement as it just sets the IAT flag: 430 IF CMD=2 THEN IAT=1:GOTO 180 Next, we handle I/O with the opcode 3: 440 IF CMD=3 THEN 450 ELSE 520 450 FOR I=P2 TO P3 460 IF P1=0 THEN PRINT(DMEM(I)):PAUSE 470 IF P1=1 THEN INPUT DMEM(I) 480 IF P1=2 THEN 490 ELSE 500 490 IF DMEM(I)=10 THEN PAUSE ELSE PRINT(CHR$(INT(DMEM(I)))); 500 IF P1=3 THEN DMEM(I)=ASC(KEY$) 510 NEXT I:GOTO 180 The specification defines two "standard" ports and two "extended" ports (although no one prevents anyone from implementing more of them). The standard ports are 0 (numeric output) and 1 (numeric input) which are implemented in the lines 460 and 470 respectively. Note the PAUSE statement after printing the memory value. This is required because of the nature of single-line TI-74 output. On the extended port 2 (character output), we don't pause the execution after printing every character (the ";" at the end tells PRINT to not print the newline afterwards), unless we explicitly print 10 which means a newline. Finally, with the port 3, we enable raw (blocking) character input in the line 500. After outputting/inputting the whole specified range, we return to the beginning of the processing loop. The opcode 4 means the copy instruction and is processed in a very straightforward manner, like this: 520 IF CMD=4 THEN 530 ELSE 580 530 IF P1=0 THEN DMEM(P3)=P2:GOTO 180 540 IF P1=1 THEN DMEM(P3)=V2:GOTO 180 550 IF P1=2 THEN DMEM(INT(V3))=P2:GOTO 180 560 IF P1=3 THEN DMEM(INT(V3))=V2:GOTO 180 570 IF P1=4 THEN DMEM(INT(V3))=DMEM(INT(V2)):GOTO 180 The opcode 5 is used to set a memory cell to values larger than 128 or to fractional values, and also is implemented in a simple way: 580 IF CMD=5 THEN DMEM(P3)=P1*100+P2+V3/100:GOTO 180 The opcode 6 is used to perform various mathematical operations. Let's first see the snippet and then I'll share some remarks about it: 590 IF CMD=6 THEN 600 ELSE 720 600 IF P1=0 THEN DMEM(P3)=V2+V3:GOTO 180 610 IF P1=1 THEN DMEM(P3)=V2-V3:GOTO 180 620 IF P1=2 THEN DMEM(P3)=V2*V3:GOTO 180 630 IF P1=3 THEN 640 ELSE 650 640 IF V3=0 THEN DMEM(P3)=0:GOTO 180 ELSE DMEM(P3)=V2/V3:GOTO 180 650 IF P1=4 THEN 660 ELSE 680 660 IF V3=0 THEN DMEM(P3)=INT(V2):GOTO 180 ELSE 670 670 DMEM(P3)=V2-V3*INT(V2/V3):GOTO 180 680 IF P1=5 THEN DMEM(P3)=ABS(V2):GOTO 180 690 IF P1=6 THEN DMEM(P3)=SQR(ABS(V2)):GOTO 180 700 IF P1=7 THEN DMEM(P3)=EXP(V2):GOTO 180 710 IF P1=8 THEN DMEM(P3)=LN(ABS(V2)):GOTO 180 720 IF P1=9 THEN DMEM(P3)=SIN(V2):GOTO 180 730 IF P1=10 THEN DMEM(P3)=COS(V2):GOTO 180 740 IF P1=11 THEN DMEM(P3)=ATN(V2):GOTO 180 Most lines directly replicate what's written in the spec, including replacing the result with 0 if the second parameter is 0 for the division and modulo operations. However, please note once more that we have to replace the modulo operation with a four-operation statement because, again, this BASIC dialect totally lacks it. The last opcode, 7, is used for random number generation and its implementation here is a direct copy from the C version: 750 IF CMD=7 THEN DMEM(P3)=INT(V1)+INT(RND*(INT(V2)-INT(V1)+1)) Note that we don't end this line with the usual :GOTO 180 because this is the next instruction to pass execution to the next step: 760 GOTO 180 770 END We need GOTO 180 on a separate line in order for the entire loop to handle unknown opcodes as NOPs. The line 770 is necessary because, as you remember, we jump to it from the line 190 once we hit the instruction value -1. So yeah, that's it. That's the entire VM logic that's implemented more or less the same for all other supported platforms and languages. Of course, the TI-74 BASIC port is missing the interactive mode but that mode became rather rudimentary in n808 anyway, and is trivial to describe and clone for any platform supporting it. If we just talk about the main logic though, it still is compact enough for me to be stunned about what it actually can do. Because what it can do (for what it is) is quite far from the esoteric realm. Speaking of what it can do, I decided to up the challenge a bit and develop a blackjack game that would be playable within such constraints. I won't review its N8A code ([2]) line by line, as it already is commented well enough, but I'm going to share the story of how it was built. In order to create a blackjack game for a numeric-only VM with just 128 program steps available, three sub-problems need to be addressed first: how to represent hands, how to generate cards and how to calculate the hand score. These sub-problems need to be solved in this very order, so let's start from the beginning. The first dilemma I faced was: do we need to represent individual card values or is just their sum enough? If so, what about aces that can have two different values depending on conditions? For mu808, I experimented with the first option and wasn't fully satisfied with the result. For n808, I opted for the second one but with a twist: we add 2 to 9 for the cards 2 to 9, 10 for the cards 10 to K, and... 101 for aces. Yes, not 1, not 11, but 101. The reason for that will become obvious in just a moment. Based on this hand representation system, random card generation (assuming we never run out of cards with a particular value, say, we have an 8-deck blackjack) also is rather straightforward. We start with a uniformly chosen random number N from 1 to 13 (inclusively). Then, we convert all the N values higher than 9 to 0 using this formula: C = N * (1 - [N/10]), where [x] is the floor operation. Then, we return the value of 10 if C is equal to 0, 101 if it is equal to 1, or the value of C itself otherwise. Now that we have a sum of cards generated in such a way, how to evaluate the true blackjack score from it? Of course, if no aces are in the hand, then the score is just this sum. But what if the hand does contain aces? This is where the genius of the value 101 comes into play. According to the blackjack rules, aces only score 11 if there is a single ace in the hand and the sum of the rest of the hand and this 11 doesn't exceed 21. If at least one of these conditions fails (there's more than one ace or the sum is busting), aces only score 1. But we already have the 1-score in the lower two digits, so the condition boils down to the hand sum being between 101 and 111 for the ace to score 11. In other words, for any hand value H, the score S can be calculated as follows: S = (H + 10) mod 100 if 100 < H < 112, S = H mod 100 otherwise. Sounds like a simple conditional branch but I decided to go further and devise a way to avoid double comparison conditionals. First, let's notice that we can't have 100 as an actual hand value so we can adjust the lower bound to 99 to be able to reuse the constant 13 we need in another part of the code. How do we reuse it? Well, at first I thought I could do this by rewriting the above conditional formula to the following form: A = (H - 112) * (H - 99), S = (H - 5 * (A/|A| - 1)) mod 100, where |x| is the absolute value of x. This formula takes advantage of the fact that the sign of (H - 112) * (H - 99) will only be negative if H is between 99 and 112. In that case, A/|A| - 1 will generate -2 and the value of 10 will be added to H. In other cases (except A=0), it will generate 0 and the value of H won't change. As we already have seen, H cannot be 99, but what if H is equal to 112? The score will then be evaluated to 17, which is totally incorrect. So, looks like we need at least one conditional: A = (H - 112) * (H - 99), S = 12 if A = 0, S = (H - 5 * (A/|A| - 1)) mod 100 otherwise. Now, where does the constant 13 come from? You can see that if we subtract 112 first then we can derive H - 99 by just adding the constant 13 we already store elsewhere. So, we have two approaches here: one involves very simple arithmetic but requires two conditionals, another one minimizes the conditional logic but requires more calculation steps. Which one is shorter? Let's look at the N8A code for the procedure with a single conditional: ; scoring procedure, parameter is in the @va cell ; the result is in the @vc cell :score dca 112 @vb ; set vb to 112 sub @va @vb ; set vb to va - 112 dva @vb @vc ; copy vb value into vc add @c_13 @vc ; add 13 to vc mul @vc @vb ; vb * vc => vb jeq @vb :s12 ; jump to scoring 12 if the result is 0 abs @vb @vc ; abs(vb) => vc div @vc @vb ; abs(vb) / vb => vb dec @vb ; vb - 1 => vb mul @c5 @vb ; vb * 5 => vb sub @va @vb ; va - vb => vb dca 100 @vc ; store 100 into vc mdf @vb @vc ; store vb mod 100 into vc ret :s12 dca 12 @vc ; hardcode the result ret 16 instructions. Now let's look at the "naive" approach with two conditionals: ; scoring procedure, parameter is in the @va cell ; the result is in the @vc cell :score dca 112 @vb ; set vb to 112 sub @va @vb ; set vb to va - 112 jge @vb :skscore ; skip adding 10 if >= 112 dca 100 @vb ; set vb to 100 sub @va @vb ; set vb to va - 100 jlt @vb :skscore ; skip adding 10 if < 100 dca 10 @vb ; set vb to 10 add @vb @va ; add 10 to va value :skscore dca 100 @vc ; set 100 to vc mdf @va @vc ; set vc to va mod 100 ret Just eleven instructions that are much more speed-optimal too. Yeah... That's another reminder that computer systems often work in a different way than pure math, and two simple conditions are often much faster than a single but complicated formula. Once the score evaluation function was settled upon, everything else wasn't as interesting. The rest of the program consists of just three loops: the outer one that starts with prompting the player to enter the bet, the first inner one that prompts for the actions and adds to the player's hand, and the second inner one that activates after the stand (or double) action has been selected and performs the dealer's actions. Then, the final scores get evaluated and the player's balance gets changed accordingly, and the next round begins in the outer loop. At the end of the day, this blackjack game takes 115 steps, making it officially the largest program I've ever written for any of the three architectures (808UL/mu808/n808) to this date. On the other hand, when compiled into the binary format (N8B), the resulting game file is just 345 bytes. Considering the maximum amount of steps in n808 is 128, the theoretical binary size limit would be 384 bytes, so this program is just 13 steps (39 bytes) short of that limit, yet it still is a fully playable game. And yes, while all existing implementations only require the text representation (N8), I am planning on some n808 ports that will only be able to consume the N8B binary format. Given all that, this might be the smallest blackjack in existence. Ever wanted to fit an entire card game into a low-res QR code? Well, now it's totally possible. And this, I might add, is just the beginning. --- Luxferre --- [1]: https://codeberg.org/luxferre/n808 [2]: https://codeberg.org/luxferre/n808/src/branch/main/examples/numjack.n8a