How to Simulate a TD4 4-Bit CPU in Python on Linux 2025
Prerequisites and What You Need Before Starting
Before writing a single line of simulator code, make sure your environment and mental model are solid. The TD4 is deceptively simple — 4 registers, 16 instructions — but the bit-level precision required means one off-by-one mask will silently corrupt every computation.
Required Knowledge: Binary Arithmetic and Basic CPU Concepts
You need a working understanding of binary numbers up to 4 bits (0000–1111, decimal 0–15), bitwise AND/OR operations, and the fetch-decode-execute cycle. You don't need to know assembly language, but you should know what a register and a program counter are.
Software Prerequisites: Python 3.10+, pip, and a Linux Terminal
This guide targets Ubuntu 22.04 or Debian 12. No external pip packages are required — this is pure Python stdlib. Verify your setup:
python3 --version # Must be 3.10 or higher
uname -r # Any modern Linux kernel is fine
| Requirement | Minimum Version / Detail | |---|---| | Python | 3.10+ (match-case optional, f-strings required) | | OS | Ubuntu 22.04, Debian 12, or equivalent | | 4-bit binary knowledge | Must understand 0b0000–0b1111 | | Register concepts | Know what A, B, PC, and OUT do conceptually | | Terminal | bash or zsh, standard Linux shell |
Understanding the TD4 Architecture at a Glance
The TD4 was designed by Iku Watanabe and documented in the Japanese book How to Build a CPU from Scratch (CPUの創りかた). It is a minimal 4-bit CPU built from 74-series TTL logic chips, intended as an educational tool. It has exactly four registers:
- A — General-purpose accumulator, 4 bits
- B — Second general-purpose register, 4 bits
- OUT — Output port, drives 4 LEDs in the original hardware
- PC — Program counter, 4 bits, addresses up to 16 ROM locations
The data bus is 4 bits wide. Programs are stored in a 16-byte ROM (each instruction is 8 bits: 4-bit opcode + 4-bit immediate operand). There is no RAM, no stack, and no interrupts — which makes it perfect for simulation in a few hundred lines of Python.
Estimated time: 45–60 minutes
Step 1: Model the TD4 Registers and ALU in Python
We need a Python class that faithfully represents the 4-bit constraint of every register. Python integers are arbitrary precision, so you must explicitly mask every value to 4 bits using & 0xF. Skipping this mask is the number-one source of bugs in CPU simulators written in high-level languages.
Defining the Four 4-Bit Registers (A, B, PC, OUT)
Implementing the 4-Bit Adder (ALU) with Carry Flag
Masking Values to Stay Within 4-Bit Bounds
class TD4CPU:
"""TD4 4-bit CPU simulator."""
def __init__(self):
self._a = 0 # Register A
self._b = 0 # Register B
self._pc = 0 # Program counter
self._out = 0 # Output port
self.carry = 0 # Carry flag (0 or 1)
self.rom: list[int] = [0x00] * 16 # 16-byte ROM, default NOP
# --- Register properties with 4-bit masking ---
@property
def A(self) -> int:
return self._a
@A.setter
def A(self, value: int):
self._a = value & 0xF
@property
def B(self) -> int:
return self._b
@B.setter
def B(self, value: int):
self._b = value & 0xF
@property
def PC(self) -> int:
return self._pc
@PC.setter
def PC(self, value: int):
self._pc = value & 0xF
@property
def OUT(self) -> int:
return self._out
@OUT.setter
def OUT(self, value: int):
self._out = value & 0xF
# --- 4-bit ALU: adder with carry ---
def add(self, operand_a: int, operand_b: int) -> tuple[int, int]:
"""Add two 4-bit values. Returns (4-bit result, carry bit)."""
raw = (operand_a & 0xF) + (operand_b & 0xF)
result = raw & 0xF
carry = (raw >> 4) & 0x1
return result, carry
def state_str(self, cycle: int) -> str:
"""Return a formatted one-line state dump."""
return (
f"Cycle {cycle:>3} | A={self.A:04b}({self.A}) "
f"B={self.B:04b}({self.B}) "
f"PC={self.PC:02X} "
f"OUT={self.OUT:04b} "
f"C={self.carry}"
)
Note: Python's property setters with
& 0xFmean you can never accidentally store a 5-bit value in any register. This is the single most important structural decision in the whole simulator.
Step 2: Build the Instruction Set and Decoder
The TD4 has exactly 16 opcodes — one for each 4-bit pattern. This maps perfectly to a Python dictionary. Using a dict of callables avoids a long if/elif chain and lets you hot-swap instruction implementations during testing.
TD4 Opcode Table: All 16 Instructions Explained
| Opcode (bin) | Mnemonic | Description | |---|---|---| | 0000 | ADD A, Im | A = A + Im; update carry | | 0001 | MOV A, B | A = B | | 0010 | IN A | A = Input port (sim: A = 0) | | 0011 | MOV A, Im | A = Im | | 0100 | MOV B, A | B = A | | 0101 | ADD B, Im | B = B + Im; update carry | | 0110 | IN B | B = Input port (sim: B = 0) | | 0111 | MOV B, Im | B = Im | | 1000 | OUT B | OUT = B | | 1001 | OUT Im | OUT = Im | | 1010 | (undefined) | Treated as NOP | | 1011 | (undefined) | Treated as NOP | | 1100 | (undefined) | Treated as NOP | | 1101 | (undefined) | Treated as NOP | | 1110 | JNC Im | If carry==0, PC = Im | | 1111 | JMP Im | PC = Im (unconditional) |
Parsing an 8-Bit Instruction Word
Each ROM byte encodes opcode = byte >> 4 and immediate = byte & 0xF. The decoder extracts both in one line.
Implementing the Decode-Execute Loop
def _build_decoder(self):
"""Return opcode dispatch table as a dict of callables."""
def add_a(im): self.A, self.carry = self.add(self.A, im)
def mov_a_b(im): self.A = self.B
def in_a(im): self.A = 0 # No hardware input in sim
def mov_a_im(im): self.A = im
def mov_b_a(im): self.B = self.A
def add_b(im): self.B, self.carry = self.add(self.B, im)
def in_b(im): self.B = 0
def mov_b_im(im): self.B = im
def out_b(im): self.OUT = self.B
def out_im(im): self.OUT = im
def nop(im): pass
def jnc(im):
if self.carry == 0:
self.PC = im
return True # Signal: PC already updated
return False
def jmp(im):
self.PC = im
return True
return {
0b0000: add_a,
0b0001: mov_a_b,
0b0010: in_a,
0b0011: mov_a_im,
0b0100: mov_b_a,
0b0101: add_b,
0b0110: in_b,
0b0111: mov_b_im,
0b1000: out_b,
0b1001: out_im,
0b1010: nop,
0b1011: nop,
0b1100: nop,
0b1101: nop,
0b1110: jnc,
0b1111: jmp,
}
Note:
jncandjmpreturnTrueto signal that they've already setPCdirectly. Thestep()method in the next section checks this return value to decide whether to auto-incrementPC.
Step 3: Implement ROM-Based Program Storage and the Fetch Cycle
The fetch cycle is the heartbeat of any CPU simulator. The TD4 ROM holds exactly 16 instructions — if your program is shorter, unused slots must be filled with NOP (0x00) to prevent the program counter from executing garbage.
Loading a Program as a List of Bytes into ROM
Implementing the Fetch-Decode-Execute Cycle
Handling Program Counter Wrap-Around at 16 Instructions
def load_rom(self, instructions: list[int]) -> None:
"""Load a program into ROM. Validates byte range, pads with NOP."""
if len(instructions) > 16:
raise ValueError(f"ROM is 16 bytes; got {len(instructions)} instructions.")
for i, byte in enumerate(instructions):
if not (0x00 <= byte <= 0xFF):
raise ValueError(f"Instruction at index {i} is not a valid byte: {byte}")
self.rom = list(instructions) + [0x00] * (16 - len(instructions))
# Reset CPU state when new program is loaded
self._a = self._b = self._pc = self._out = self.carry = 0
self._decoder = self._build_decoder()
def step(self) -> None:
"""Fetch-decode-execute one clock cycle."""
# FETCH
instruction = self.rom[self.PC]
opcode = (instruction >> 4) & 0xF
immediate = instruction & 0xF
# Advance PC before execute (default); jump instructions override this
next_pc = (self.PC + 1) & 0xF
# DECODE + EXECUTE
handler = self._decoder[opcode]
jumped = handler(immediate)
# Only update PC if the instruction did not perform a jump
if not jumped:
self.PC = next_pc
Attach load_rom and step as methods of TD4CPU. Now the class is functionally complete. The & 0xF on next_pc ensures the program counter wraps from address 15 back to 0 — matching the behavior of the original hardware.
Note: Always call
load_rom()before your firststep()call. The loader resets all registers, so loading a new program mid-simulation starts fresh — useful for unit tests.
Step 4: Write and Run Your First TD4 Assembly Program
Now let's assemble two programs by hand and watch the simulator execute them. Hand-assembling confirms you understand the instruction encoding before reaching for helper tools.
Writing a Counter Program that Increments Register A
The simplest interesting TD4 program: clear A, then keep adding 1 and jumping back.
Addr Opcode+Im Mnemonic
0 0011 0000 MOV A, 0 ; A = 0
1 0000 0001 ADD A, 1 ; A = A + 1
2 1111 0001 JMP 1 ; PC = 1 (loop forever)
Encoded: [0x30, 0x01, 0xF1]
Writing a Blink Program Using the OUT Port
Out 0b1010 then 0b0101 in a loop simulates alternating LED patterns:
Addr Opcode+Im Mnemonic
0 1001 1010 OUT 0xA ; OUT = 1010
1 1001 0101 OUT 0x5 ; OUT = 0101
2 1111 0000 JMP 0 ; PC = 0
Encoded: [0x9A, 0x95, 0xF0]
Running the Simulator and Inspecting State Each Cycle
# td4sim.py — paste the TD4CPU class above this block
def run_counter_demo():
cpu = TD4CPU()
# MOV A,0 | ADD A,1 | JMP 1
cpu.load_rom([0x30, 0x01, 0xF1])
header = f"{'Cycle':>5} | {'A':>6} | {'B':>6} | {'PC':>4} | {'OUT':>6} | C"
print(header)
print("-" * len(header))
for cycle in range(32):
print(
f"{cycle:>5} | "
f"{cpu.A:04b}({cpu.A:>1}) | "
f"{cpu.B:04b}({cpu.B:>1}) | "
f"{cpu.PC:>4} | "
f"{cpu.OUT:04b} | "
f"{cpu.carry}"
)
cpu.step()
if __name__ == "__main__":
run_counter_demo()
Expected output (first 10 cycles):
Cycle | A | B | PC | OUT | C
------------------------------------------
0 | 0000(0) | 0000(0) | 0 | 0000 | 0
1 | 0000(0) | 0000(0) | 1 | 0000 | 0
2 | 0001(1) | 0000(0) | 1 | 0000 | 0
3 | 0010(2) | 0000(0) | 1 | 0000 | 0
4 | 0011(3) | 0000(0) | 1 | 0000 | 0
5 | 0100(4) | 0000(0) | 1 | 0000 | 0
6 | 0101(5) | 0000(0) | 1 | 0000 | 0
7 | 0110(6) | 0000(0) | 1 | 0000 | 0
8 | 0111(7) | 0000(0) | 1 | 0000 | 0
9 | 1000(8) | 0000(0) | 1 | 0000 | 0
At cycle 16, A rolls over from 15 to 0 and carry becomes 1. This is correct — the 4-bit adder wraps at 16.
Step 5: Add a Simple Text-Based Debugger and Disassembler
Printing raw hex addresses isn't useful when something goes wrong. A disassembler turns ROM bytes back into readable mnemonics, and a single-step mode lets you press Enter to tick each clock cycle manually — exactly how you'd use a logic analyzer on real hardware.
Disassembling ROM Bytes Back to Mnemonics
MNEMONICS = {
0b0000: "ADD A",
0b0001: "MOV A, B",
0b0010: "IN A",
0b0011: "MOV A",
0b0100: "MOV B, A",
0b0101: "ADD B",
0b0110: "IN B",
0b0111: "MOV B",
0b1000: "OUT B",
0b1001: "OUT",
0b1010: "NOP",
0b1011: "NOP",
0b1100: "NOP",
0b1101: "NOP",
0b1110: "JNC",
0b1111: "JMP",
}
IMM_OPCODES = {0b0000, 0b0011, 0b0101, 0b0111, 0b1001, 0b1110, 0b1111}
def disassemble(rom: list[int]) -> list[str]:
"""Convert a 16-entry ROM into a list of human-readable instruction strings."""
lines = []
for addr, byte in enumerate(rom):
opcode = (byte >> 4) & 0xF
im = byte & 0xF
mnem = MNEMONICS.get(opcode, "???")
if opcode in IMM_OPCODES:
lines.append(f"{addr:02X}: {byte:08b} {mnem}, {im}")
else:
lines.append(f"{addr:02X}: {byte:08b} {mnem}")
return lines
Single-Step Mode with Keyboard Input in the Linux Terminal
def single_step_debugger(cpu: TD4CPU) -> None:
"""Interactive single-step debugger. Press Enter to advance one cycle."""
print("=== TD4 Disassembly ===")
for line in disassemble(cpu.rom):
print(" ", line)
print()
print("Press Enter to step, 'q' then Enter to quit.")
print(f"{'Cycle':>5} | A | B | PC | OUT | C")
print("-" * 46)
cycle = 0
while True:
print(
f"{cycle:>5} | {cpu.A:04b} | {cpu.B:04b} | {cpu.PC:>3X} "
f"| {cpu.OUT:04b} | {cpu.carry} ",
end=""
)
cmd = input("")
if cmd.strip().lower() == 'q':
print("Exiting debugger.")
break
cpu.step()
cycle += 1
To use it, replace run_counter_demo() in your __main__ block:
if __name__ == "__main__":
cpu = TD4CPU()
cpu.load_rom([0x30, 0x01, 0xF1])
single_step_debugger(cpu)
Piping Output to a Log File
For automated testing or long runs, capture everything to a file:
python3 td4sim.py 2>&1 | tee sim.log
This sends both stdout and stderr to sim.log while still displaying it in your terminal. Grep the log later to find specific register states:
grep "C=1" sim.log # Find every cycle where carry was set
Common Issues and Fixes
| Symptom | Root Cause | Fix |
|---|---|---|
| A keeps growing past 15 | Missing & 0xF on register setter | Use the property setter; never write self._a = value directly |
| IndexError: list index out of range at PC | PC exceeded 15 without wrap-around | Apply & 0xF to every PC increment: (PC + 1) & 0xF |
| JNC jumps when carry is set | Condition inverted — checked carry == 1 instead of carry == 0 | JNC = Jump if No Carry; condition must be if self.carry == 0 |
| OUT always shows 0 | Carry flag not reset before ADD | The carry flag is a result of ADD, not a persistent input; reset it only when a new ADD runs |
Fix: Carry Flag Not Resetting Between Instructions
The carry flag in TD4 is produced by an ADD instruction and remains set until the next ADD overwrites it. It does not automatically reset to 0 each cycle. The only instructions that modify carry are ADD A, Im and ADD B, Im. If you implement carry as a global that resets to 0 every cycle, JNC will behave incorrectly for programs that check carry across multiple non-ADD instructions.
# WRONG: resets carry before every instruction
def step(self):
self.carry = 0 # <-- BUG: erases carry set by previous ADD
...
# CORRECT: carry only changes inside add_a / add_b handlers
def add_a(im): self.A, self.carry = self.add(self.A, im)
Fix: Program Counter Exceeding 15 Causing IndexError
If you write self.PC += 1 without masking, after address 15 the PC becomes 16, and self.rom[16] raises IndexError. Always use:
self.PC = (self.PC + 1) & 0xF
Fix: JNC Jumping When Carry Is Set Instead of Clear
"JNC" literally means Jump if No Carry. The predicate is carry == 0, not carry == 1. This is the most common logic inversion in 4-bit CPU implementations:
# WRONG
def jnc(im):
if self.carry == 1: # Backwards!
self.PC = im
return True
# CORRECT
def jnc(im):
if self.carry == 0:
self.PC = im
return True
return False
Fix: Output Port Not Reflecting Correct 4-Bit Value
When you implement OUT Im, the immediate value comes from the lower nibble of the instruction byte. If you forget to mask the OUT setter and pass a full 8-bit byte by accident, the OUT port will show values above 15. The property setter in the class above handles this automatically via value & 0xF, but double-check that you're passing im (already masked to 4 bits) rather than the raw instruction byte.
Frequently Asked Questions
Q: Can I extend the TD4 simulator to 8 bits?
Yes, and it's a worthwhile exercise. Replace every & 0xF mask with & 0xFF, expand the ROM to 256 entries (still one byte per address, but now addressed by 8-bit PC), and widen the immediate field from 4 bits to 8 bits by using a two-byte instruction format (opcode byte + operand byte). You'll also want to define a richer opcode table — the 4-bit opcode space only gives you 16 instructions, so consider moving to an 8-bit opcode for 256 possible instructions, similar to the Intel 8080 architecture.
Q: How does the TD4 differ from a real CPU like the 8080 or Z80?
The TD4 is a pedagogical device, not a production CPU. It has no RAM — programs live in ROM and there is no writable memory. It has no stack pointer, no subroutine call/return instructions, no interrupts, and no memory-mapped I/O beyond the single 4-bit OUT port. The 8080 and Z80 have 16-bit address buses addressing 64 KB of memory, multiple 8-bit and 16-bit registers, rich flag sets (sign, zero, parity, half-carry), and interrupt vectors. The TD4's value is that its entire design fits on two pages and can be wired up with a handful of 74-series chips — that clarity of mechanism is what makes it a superior teaching tool.
Q: Is there a way to visualize the TD4 clock cycle graphically on Linux?
Absolutely — Python's built-in curses library lets you build a live terminal dashboard without any additional dependencies. Use curses.wrapper() to take over the terminal, then redraw register values as bit patterns with ANSI box-drawing characters each time step() is called. A simpler approach is to emit ANSI escape codes directly: \033[2J\033[H clears the screen, and \033[1m makes text bold, so you can render a refreshing register display at any clock speed by calling time.sleep(0.1) between steps. For a richer GUI, the blessed library (pip installable) wraps curses with a far more ergonomic API and works perfectly on Ubuntu 22.04.
Recommended Tools
- DigitalOceanCloud hosting built for developers — $200 free credit for new users
- GitHubWhere the world builds software