← back

VLSI Tetris Core

Introduction to VLSI System Design

overview

ELEC 422 was my first introduction to VLSI design. The laborious homeworks of painting transistors on Magic and building finite state machines (FSM's) culminated in a final project that used the datapath-FSM structure to implement a full TetrisASIC. This is how we built it!

Tetris is played on a 10-column grid where 7 distinct pieces (the S, J, O, I, T, L, and Z tetrominoes) fall from the top. The player moves and rotates each piece before it locks in place. Filling a complete row clears it and scores points. The game ends when pieces stack to the top.

The 7 Tetris pieces: S, J, O, I, T, L, Z

The chip was a final project for ELEC 422: VLSI System Design in Spring 2026, built with my teammates Kathryn Files and Atishay Lalgudi under the team name Memory Mafia. The goal was full functional correctness of every game mechanic (movement, rotation, gravity, line clears, game over) on a fully synthesizable design that could be placed and routed in the AMI 0.5 µm standard-cell flow.

play some tetris first

arrow keys to move · up to rotate · down to drop

credit: tetris engine by straker

architecture

The chip uses the FSM + Datapath (FSMD) methodology taught in the course. It splits into two hierarchical modules: a controller (tetris_fsm) that decides what happens next, and a datapath (tetris_datapath) that does the heavy lifting of moving bits around the board. They sit in a closed loop: status flags from the datapath (collision, full row, game over) feed the FSM's next-state logic, and the FSM in turn issues command pulses and ALU op-codes back to the datapath.

All sequential logic is governed by two non-overlapping clocks, clka and clkb, supplied externally. The FSM latches its next state on negedge clka and drives its outputs on negedge clkb, so by the time the datapath registers fire on clkb, every control signal from the FSM has fully resolved. This phase separation is what makes the design safe to synthesize without race conditions.

controller (tetris_fsm)

A Moore-style FSM that watches the four button inputs (btn_left, btn_right, btn_rotate, btn_drop) and the datapath's status flags. Instead of doing any coordinate math itself, it just issues discrete commands:

Main states are IDLE, SPAWN, INPUT_POLL, VALIDATE, UPDATE, LOCK, CLEAR_CHECK, DISP_SCAN, and GAME_OVER. The display scan walks all 20 rows out the bottom each frame with disp_row_valid high.

datapath (tetris_datapath)

This is where the 20×10 board lives. A few constraints from the course flow shaped almost every decision:

No runtime array indexing. Design Compiler in the AMI 0.5 µm flow can't synthesize board[i][j] when i is a variable. So the board is declared as 20 individually named 10-bit registers (board0 through board19), and every read or write happens through a fully unrolled case statement.

Everything combinational is unrolled. Collision detection compares the next piece position against the existing board purely in combinational logic, producing collision_flag in one cycle. Display scanning is a multiplexer tree that fans out the right row onto disp_row_data based on disp_row_addr.

piece mask ROM

All 7 tetrominoes across all 4 rotations are encoded as a combinational lookup table. The key {piece_type, rot_state} indexes 28 entries, each returning a 16-bit mask for the piece's 4×4 bounding box. A separate rot_mask table returns the next rotation, which lets the collision detector pre-check a rotation before the FSM commits to it.

pseudo-random piece selection (LFSR)

A 7-bit LFSR implementing x⁷ + x⁶ + 1 picks the next tetromino. It advances on each falling edge of clkb:

lfsr <= {lfsr[5:0], lfsr[6] ^ lfsr[5]};

That tap configuration gives a maximal-length sequence of 127 states before repeating. To avoid the all-zeros lockup, a free-running 7-bit counter seeds the LFSR at reset (falling back to a hardcoded 7'b1001101 if the counter happens to be zero). The lower 3 bits select the piece type, with 3'd7 remapped to 3'd0 so the distribution stays valid across 7 pieces.

verification

Three rounds of simulation, all driven by the same testbenches:

  1. Pre-synthesis in Questa. Each module on its own (FSM, datapath, top), verifying spawn, movement, gravity, line clear, and game-over behavior.
  2. Post-synthesis in Questa. Same testbenches against the gate-level netlist out of Design Compiler. Waveforms matched the pre-synth runs exactly, confirming DC introduced no functional regressions.
  3. Post-layout in IRSIM. Magic generated a behavioral netlist from the placed-and-routed core, and the same test sequence ran one more time against the core and again through the padframe. Padframe signals are prefixed p_ at the pad boundary and matched the core signals one for one.

synthesis & layout

Synthesized with Design Compiler against the OSU05 standard-cell library (AMI 0.5 µm).

Target clock20 MHz
Cell count6,056
Est. power28.77 mW
Core size7319 × 7140 λ

Place-and-route was done in Innovus, then imported to Magic for DRC and padframe integration. The dense, uniform cell arrangement is a direct consequence of the 20 named board registers, the fully unrolled collision detection, and the 28-entry mask ROM.

Innovus core layout next to GDS imported into Magic
Innovus place-and-route, then GDS read into Magic

The core needed 61 I/O pins, which overflowed the standard 64-pin padframe once you count VDD/GND, so we used Gavin Jing's padframe generator to build a custom 104-pin padframe. Every signal pin is an OSU bidirectional pad: OEN high drives the pin, OEN low captures it. Unused pads are tied to inputs with OEN low.

VLSI Tetris Core, full padframe in Magic
core integrated into the 104-pin padframe (Magic)

testing the fabricated chip

If this design ever came back from fab, the test rig would be:

From there it's a stepwise bring-up: assert restart, watch the first piece spawn, exercise each button, let gravity run, stack a full row to confirm out_line_clear and the score increment, then stack to the top to confirm out_game_over.

presentation

Final slide deck from the ELEC 422 tapeout review.

open the slides in a new tab

design report

The full design report covers waveform captures from every simulation stage, the FSM state diagram, the Innovus and Magic layouts, the 104-pin pad map, and the bidirectional pad behavior table.

open the report in a new tab