A long long time ago, when I got my first job on wall street, I had the opportunity to partake in writing some of the fastest high frequency financial execution engines possible. Although I barely remember the x86 assembly language, I still haven’t completely forgotten it either.
I wanted to use some of these forgotten skills to write some fast climate data gathering programs because there’s too much bloat in existing software, but honestly, it’s just too tough and I don’t have the time. But in trying, I wrote the tiniest useful program possible on the x86 platform.
While some girls like to solve standard puzzles, I’ve always enjoyed putting my autistic skills to finding new exotic puzzles to solve.
So what puzzle am I trying to solve?
A: Creating the smallest program on linux that actually does something useful. This program must be its own executable and not just text interpreted by another program.
What can be more useful than a being able to do anything a computer can do?
How about a compiler? …. What kind of a compiler?
How about compiling 1’s and 0’s into runable code?
When was the last time anyone actually programmed by typing only 1’s and 0’s on the keyboard?
Well, I’m bringing it back!
First I needed to find out how to make the smallest runable program on linux, and its filesize. Luckily someone figured this out in a very clever way. Thank you, Brian Raiter.
Here, at last, we have honestly gone as far as we can go. There is no getting around the fact that the 45th byte in the file, which specifies the number of entries in the program header table, needs to be non-zero, needs to be present, and needs to be in the 45th position from the start of the ELF header. We are forced to conclude that there is nothing more that can be done.
— Brian Raiter
Here is Brian’s solution:
; tiny.asm - Brian Raiter
BITS 32
org 0x00010000
db 0x7F, "ELF"
dd 1
dd 0
dd $$
dw 2
dw 3
dd _start
dd _start
dd 4
_start:
mov bl, 42 ; M
xor eax, eax ; E
inc eax ; A
int 0x80 ; T
db 0
dw 0x34
dw 0x20
db 1
; -----------------------
; $ nasm -f bin -o a.out tiny.asm
; $ chmod +x a.out
; $ ./a.out ; echo $?
; 42
; $ wc -c a.out
; 45 a.out
This 45 byte program (tiniest possible) simply sets the exit code to 42, and then exits. The real meat of his program is just 7 bytes, and can actually be 5 (no xor eax,eax) with the same result.
While Brian used NASM, I prefered to use FASM, since I’m more familiar with it.
Now here’s my tiny masterpiece:
; Zoe Phin, 2021/04/24
; bex.asm
use32
org 0x505A4000
db 0x7F, "ELF"
dd 1
dd 0
dd $$
dw 2
dw 3
dd start
dd start
dd 4
start: pop ecx
pop edi
jmp .end
.arg: pop esi
.char: lodsb
cmp al, 0
jmp .cont
dw 32
dd 1
.cont: je .done
rcr al, 1
adc ah, ah
jmp .char
.done: mov [edi+edx], ah
inc edx
xor eax,eax
.end: loop .arg
jmp edi
I call it bex, short for Binary EXecution. What does it do?
It converts every argument consisting of ASCII 1’s and 0’s on the command line into its binary representation, and then jumps execution to the first argument.
So for example, the shortest program would just be just to exit cleanly without segfault:
$ ./bex 01000000 11001101 10000000
The ASCII binary represents opcodes for
inc eax ; 01000000
int 80h ; 11001101 10000000
Brian’s program would be:
mov bl, 42 ; 10110011 00101010
inc eax ; 01000000
int 80h ; 11001101 10000000
Run it:
$ ./bex 10110011 00101010 01000000 11001101 10000000
$ echo $?
42
$
And now I test a semi-Quine:
$ ./bex 10110000 00000100 01000011 10001001 11111001 11001101 10000000 10010011 11001101 10000000 | xxd -b -c10 | cut -c11-99
10110000 00000100 01000011 10001001 11111001 11001101 10000000 10010011 11001101 10000000
It’s a semi-Quine because the output is a binary representation of the ASCII input, but not the ASCII input itself.
The assembly code equivalent is:
mov al, 4 ; 10110000 00000100
inc ebx ; 01000011
mov ecx,edi ; 10001001 11111001
int 80h ; 11001101 10000000
xchg eax,ebx ; 10010011
int 80h ; 11001101 10000000
A few notes:
- Leading 0’s are not needed in an argument ( ex: 101 = 00000101 )
- 1’s and 0’s are not specifically needed. 1 is an odd ASCII value, while 0 is an even ASCII value
- Any non-binary data can also be placed directly inline
Example of notes:
$ ./bex 7.77.... 188 1....11 10110010 ++.+ 30003003 1111---1 121221 11414441 11661121 9TTTTTTT 1RR1PP11 11351171 9:::.::: "Hello World!"
Hello World!
Assembly equivalent:
mov al, 4 ; 10110000 00000100 ; 7.77.... 188
inc ebx ; 01000011 ; 1....11
mov dl, 13 ; 10110010 00001101 ; 10110010 ++.+
mov ecx,esi ; 10001001 11110001 ; 30003003 1111---1
sub ecx,edx ; 00101001 11010001 ; 121221 11414441
int 80h ; 11001101 10000000 ; 11661121 9TTTTTTT
xchg eax,ebx ; 10010011 ; 1RR1PP11
int 80h ; 11001101 10000000 ; 11351171 9:::.:::
The code basically tells linux to write starting 13 characters from the end.
You can obtain bex by writing the binary directly and making it executable, like so:
$ echo -en $(printf "\\\x%s" 7F 45 4C 46 01 00 00 00 00 00 00 00 00 40 5A 50 02 00 03 00 20 40 5A 50 20 40 5A 50 04 00 00 00 59 5F EB 1A 5E AC 3C 00 EB 06 20 00 01 00 00 00 74 06 D0 D8 10 E4 EB ED 88 24 17 42 31 C0 E2 E4 FF E7) > bex; chmod +x bex
Or you can execute a script to download FASM and compile and test from source code:
$ bash <<< $(wget -qO- https://pastebin.com/raw/Tvmg0nr6 | tr -d '\r')
Successful result:
flat assembler version 1.73.27 (16384 kilobytes memory)
2 passes, 66 bytes.
Input: 10110000 00000100 01000011 10001001 11111001 11001101 10000000 10010011 11001101 10000000
Output: 10110000 00000100 01000011 10001001 11111001 11001101 10000000 10010011 11001101 10000000
Match!
The tiniest useful executable is only 66 bytes!
Mission accomplished. Puzzle solved.
You can do pretty much anything in it! Its inconvenience is a feature 😉
A few notes on the run environment:
- edx is set to number of arguments
- edi points to first argument
- esi points to NULL located after arguments, just before ENVironment variables.
Yes, this is obviously a 32-bit program, but it runs perfectly fine on x86-64 based linux.
Enjoy 🙂 -Zoe
P.S.: As a bonus, bex includes my initials directly in the binary:
00000000: 7F 45 4C 46 01 00 .ELF..
00000006: 00 00 00 00 00 00 ......
0000000c: 00 40 5A 50 02 00 .@ZP..
00000012: 03 00 20 40 5A 50 .. @ZP
00000018: 20 40 5A 50 04 00 @ZP..
0000001e: 00 00 59 5F EB 1A ..Y_..
00000024: 5E AC 3C 00 EB 06 ^.<...
0000002a: 20 00 01 00 00 00 .....
00000030: 74 06 D0 D8 10 E4 t.....
00000036: EB ED 88 24 17 42 ...$.B
0000003c: 31 C0 E2 E4 FF E7 1.....
Oh Zoe, so it was YOU who wrote those Hi Frequency trading programs ! Which high school did you attend? I think it must of been St Trinians.
What would Ronald Searle think of all this?
Nice presentation on the compiler though. Thanks.
LikeLiked by 1 person
No, I didn’t know much in high school except boys. Thank you.
LikeLike