An LLM On A Chip

by Komal Vachhani

About the Project

💡 Project Background

This project’s goal was to design, simulate, synthesize, and implement a variety of hardware modules for FPGA-based computing. These modules focused on high-performance operations such as matrix multiplication, self-attention mechanisms used in machine learning models, and hardware accelerators for specific arithmetic functions like multiplication-accumulation (MAC), division, and exponential operations. The project dove into FPGA-based design and hardware-software co-design using Verilog, Vivado, and the PYNQ platform.

IBERT Self Attention Block Diagram

Softmax Module Block Diagram

🔨 Tools

Verilog: Used for designing hardware modules, including systolic arrays for matrix multiplication, arithmetic modules, and attention heads
Vivado: A hardware design suite used for simulation, synthesis, and implementation of hardware designs.
PYNQ: An FPGA board that integrates ARM and FPGA for hardware acceleration, used for real-time testing of designs
Python: Utilized for writing test scripts to validate the hardware modules and for working with the PYNQ APIs to interact with the FPGA
Simulation Framework: Xilinx XSIM and Verilator were used for debugging and testing the design in simulation before deployment on the hardware
Git: For version control

🔑 Key Learnings/Takeaways

FPGA Design Flow: From designing hardware modules in Verilog to deploying them on the PYNQ board, I learned how to follow the complete FPGA design flow: simulation, synthesis, implementation, and hardware verification.
Efficient Hardware Design: Through implementing modules like matrix multiplication and GELU, I gained insights into how to optimize hardware for parallelism and performance, especially in applications like machine learning.