Skip to content

tuna/tapa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2,562 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TAPA

Staging Build Documentation

TAPA is a task-parallel HLS framework that compiles C++ dataflow programs to Verilog RTL for Xilinx FPGAs. Software simulation runs on any Linux machine without FPGA hardware.

C++ source → tapa compile → RTL (.xo) → Vitis v++ → FPGA bitstream

TAPA is community maintained by the Tsinghua University TUNA Association.

Published results: 2× higher frequency on average versus Vivado [1], with 7× faster compilation and 3× faster software simulation versus Vitis HLS [2].

Quick Start

Install

curl -fsSL https://raw.githubusercontent.com/tuna/tapa/main/install.sh | sh -s -- -q

With root privileges, installs to /opt/tapa (symlinks in /usr/local/bin). Without root, installs to ~/.tapa and updates your shell PATH.

Requirements: Linux (Ubuntu 18.04+, Debian 10+, RHEL 9+, Fedora 34+, Amazon Linux 2023), g++ 7.5.0+. Vitis HLS 2022.1+ is required for RTL synthesis and on-board execution — not for software simulation.

To install a specific version:

curl -fsSL https://raw.githubusercontent.com/tuna/tapa/main/install.sh \
  | TAPA_VERSION=0.1.20260319 sh -s -- -q

Releases: github.com/tuna/tapa/releases

Software simulation (no FPGA required)

# Compile kernel + host together using the tapa g++ wrapper
tapa g++ -- vadd.cpp vadd-host.cpp -o vadd

# Run — executes on the CPU using TAPA's coroutine simulator
./vadd

Expected output:

I20000101 00:00:00.000000 0000000 task.h:66] running software simulation with TAPA library
kernel time: 1.19429 s
PASS!

Compile to hardware

tapa compile \
  --top VecAdd \
  --part-num xcu280-fsvh2892-2L-e \
  --clock-period 3.33 \
  -f vadd.cpp \
  -o vadd.xo

# Run fast RTL cosimulation against the XO artifact
./vadd --bitstream=vadd.xo 1000

Programming Model

A TAPA design is a directed graph of concurrent tasks connected by typed streams. An upper-level task declares streams and launches child tasks; leaf tasks perform computation. The same C++ code runs in software simulation and compiles to RTL.

// Kernel file (vadd.cpp)
void VecAdd(tapa::mmap<const float> a, tapa::mmap<const float> b,
            tapa::mmap<float> c, uint64_t n) {
  tapa::stream<float> a_q("a"), b_q("b"), c_q("c");

  tapa::task()
      .invoke(Mmap2Stream, a, n, a_q)   // reads DRAM → stream
      .invoke(Mmap2Stream, b, n, b_q)
      .invoke(Add, a_q, b_q, c_q, n)   // stream → stream
      .invoke(Stream2Mmap, c_q, c, n); // stream → DRAM
}

// Host file (vadd-host.cpp)
tapa::invoke(VecAdd, FLAGS_bitstream,
             tapa::read_only_mmap<const float>(a),
             tapa::read_only_mmap<const float>(b),
             tapa::write_only_mmap<float>(c), n);

Documentation

Full documentation: tapa.readthedocs.io

Section Description
Installation Install from release or build from source
Your First Run Software simulation without FPGA hardware
How-To Guides Build, simulate, and deploy designs
Tutorials Annotated labs from vadd to floorplanning
C++ API Reference Full API: tasks, streams, mmap, utilities
CLI Reference All tapa subcommands and flags
Troubleshooting Common errors, deadlocks, cosim issues

Building from Source

# Install dependencies (Ubuntu/Debian)
sudo apt-get install g++ binutils git python3

# Install Bazel — see https://bazel.build/install

git clone https://github.com/tuna/tapa.git
cd tapa
bazel build //...

See Building from Source for the full guide.

Published Results

  • Serpens (DAC'22): 270 MHz on Xilinx Alveo U280 with 24 HBM channels; the Vitis HLS baseline failed to route.
  • Sextans (FPGA'22): 260 MHz on Xilinx Alveo U250 versus 189 MHz with Vivado baseline.
  • SPLAG (FPGA'22): Up to 4.9× speedup over prior FPGA accelerators; up to 0.9× vs. A100 GPU.
  • AutoSA (FPGA'21): Systolic-array compiler with frequency improvements over Vitis HLS baseline.
  • Callipepla (FPGA'23): 3.94× speedup over Xilinx XcgSolver; 3.34× better energy efficiency than A100 GPU.
  • LevelST (FPGA'24): 2.65× speedup, 9.82× higher energy efficiency vs. V100/RTX 3060 with cuSPARSE.
  • CHIP-KNN (ICFPT'20 / TRETS'23): 252 MHz on Alveo U280 versus 165 MHz with Vivado; v2 up to 45× over 48-thread CPU.

Publications

Core papers describing the TAPA compiler and the physical design toolflow it integrates:

  1. Yuze Chi et al. Extending high-level synthesis for task-parallel programs. FCCM, 2021.
  2. Licheng Guo et al. TAPA: A scalable task-parallel dataflow programming framework for modern FPGAs with co-optimization of HLS and physical design. TRETS, 2023.
  3. Licheng Guo et al. AutoBridge: Coupling coarse-grained floorplanning and pipelining for high-frequency HLS design on multi-die FPGAs. FPGA, 2021. (Best Paper Award)
  4. Young-kyu Choi et al. TARO: Automatic optimization for free-running kernels in FPGA high-level synthesis. TCAD, 2022.
  5. Licheng Guo et al. RapidStream: Parallel physical implementation of FPGA HLS designs. FPGA, 2022. (Best Paper Award)
  6. Licheng Guo et al. RapidStream 2.0: Automated parallel implementation of latency-insensitive FPGA designs through partial reconfiguration. TRETS, 2023.
  7. Jason Lau et al. RapidStream IR: Infrastructure for FPGA high-level physical synthesis. ICCAD, 2024.
  8. Neha Prakriya et al. TAPA-CS: Enabling scalable accelerator design on distributed HBM-FPGAs. ASPLOS, 2024.
  9. Moazin Khatti et al. PASTA: Programming and automation support for scalable task-parallel HLS programs on modern multi-die FPGAs. FCCM, 2023 / TRETS, 2024.
  10. Suhail Basalama, Jason Cong. Stream-HLS: Towards automatic dataflow acceleration. FPGA, 2025.
  11. Akhil Raj Baranwal, Zhenman Fang. PoCo: Extending task-parallel HLS programming with shared multi-producer multi-consumer buffer support. TRETS, 2025.

For annotated descriptions and the full list of application papers, see Publications.

License

TAPA is open-source software licensed under the MIT license. See LICENSE for details.


Copyright (c) 2026 TAPA community maintainers and contributors.
Copyright (c) 2024 RapidStream Design Automation, Inc. and contributors.
Copyright (c) 2020 Yuze Chi and contributors.

About

TAPA compiles task-parallel HLS program into high-performance FPGA accelerators. Community-maintained.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors