OrgPad logo

Festa Hardware

Created by Joseph Joy

Festa Hardware

AXI4 Lite Interface

Accelerator's control and status register interface. It allows a processor to manage the hardware.

Pipelining

The coordination between the FSM, AXI interfaces, and the algorithm cores is the key to achieving the high-throughput, pipelined performance described in the paper. Think of it as a perfectly synchronized factory assembly line.

Here is how they work together to pipeline the processing of LiDAR frames.

The Roles of Each Component

  1. AXI-Lite Slave (The Manager's Office): Before anything happens, the main processor uses this interface to talk to the FSM. It's used for slow, one-time tasks like:
    • Writing the configuration parameters (zeta, epsilon, delta) into registers.
    • Writing to a control register to issue the main start command that kicks off the entire process.
  2. AXI-Stream Interfaces (The Conveyor Belts): These are responsible for the continuous, high-speed movement of data.
    • AXI-Stream Slave (Input Belt): Feeds a constant stream of LiDAR points into the first workstation. It uses the TLAST signal to tell the FSM, "This is the last piece of the current job (frame)."
    • AXI-Stream Master (Output Belt): Takes the finished product (the 1-bit classification flags) from the second workstation and streams it out of the factory.
  3. FSM (The Factory Manager): The FSM doesn't do any point processing itself. Instead, it directs the traffic. It has a simple set of states (IDLE, CLEARING_GRID, PROCESSING) and tells the two algorithm cores when to work based on the signals from the AXI interfaces.
  4. Algorithm Cores (The Workstations):
    • Cell Grid Core (Workstation 1): Takes raw materials (points) from the input belt and does the first processing step (populating the BRAMs).
    • Ground Segmentation Core (Workstation 2): Takes the semi-finished product from Workstation 1 (via the BRAMs) and completes the final assembly (classification).

The Pipelined "Symphony" in Action

Let's trace the flow of two consecutive frames, Frame N and Frame N+1, to see the pipeline in action.

Step 1: Processing Frame N Begins

  1. The system is IDLE. The processor has already configured the parameters via AXI-Lite.
  2. Points for Frame N start arriving at the AXI-Stream Slave.
  3. The FSM sees the data arriving, moves to its PROCESSING state, and enables the Cell Grid Core (Workstation 1).
  4. The Cell Grid Core processes each point of Frame N, writing to the 'A' ports of the Grid BRAM and Points BRAM.
  5. During this entire time, the Ground Segmentation Core (Workstation 2) is idle. It has nothing to do yet.

Step 2: The Critical Hand-Off (The Pipelining Magic)

  1. The very last point of Frame N arrives at the AXI-Stream Slave. The TLAST signal is asserted for one clock cycle.
  2. The FSM sees the TLAST signal. This is the trigger for the entire pipeline. It immediately does two things in the very next clock cycle:
    • It sends a start signal to the Ground Segmentation Core (Workstation 2), telling it, "The data for Frame N is ready in the BRAMs. Begin your work."
    • It keeps the Cell Grid Core (Workstation 1) enabled. Why? Because the points for the next frame, Frame N+1, are arriving on the AXI-Stream input right behind Frame N.

Step 3: Full Pipeline Operation

This is the state where the system achieves maximum throughput. For the entire duration that Frame N+1 is being processed:

This parallel operation is only possible because the Block RAMs are dual-port. One core can write to one side of the memory while the other core reads from the other side without conflict.The FSM's job is simply to manage these start/stop signals at the frame boundaries. The AXI interfaces ensure the data keeps flowing smoothly, and the algorithm cores just focus on their dedicated tasks. This elegant coordination allows the accelerator to start processing a new frame long before it has finished with the previous one, which is the very definition of a pipelined architecture.

FSM

Points Bram

Size Determined  by the maximum number of points in a frame .

AXI4 Stream Slave Interface

Cell Grid Core

Responsible for the first step—creating and populating the grid.

Processes each point from the input LiDAR frame to determine which grid cell it belongs to.

Functionality:

Ground Segmentation Core

Performs the second step—classifying points as ground or non-ground. It uses the pre-calculated grid information (Z_min, Z_max) to make its decision for each point.

After the Cell Grid Core has processed the entire frame, the Ground Segmentation Core starts reading the point data that was stored in the Points BRAM sequentially.

AXI4 Stream Master Interface

Annotation to Segmentation

The Core Concept: Synchronization is Everything ⛓️

The fundamental principle is that the output stream of flags is perfectly synchronized with the original input point cloud.

The AXI stream of 0s and 1s is essentially a tag or a label for each point. The downstream system's job is to correlate these tags with the original point data, which it has kept a copy of.

The System-Level Workflow

Here is the step-by-step process that happens outside the FESTA accelerator, typically on the host processor (like the ARM core in a Zynq SoC).

Step 1: Buffer the Original Point Cloud

Before the point cloud frame is sent to the FPGA, the host processor makes sure it has a complete copy of it stored in main memory (DDR RAM). Let's call this OriginalPointCloud_FrameN.

Step 2: Stream Data and Receive Flags

The processor configures a DMA (Direct Memory Access) controller to perform two tasks:

  1. Write Task: Read OriginalPointCloud_FrameN from the DDR RAM and stream it to the FESTA accelerator's AXI-Stream Slave input.
  2. Read Task: Simultaneously, configure another DMA channel to listen to the FESTA accelerator's AXI-Stream Master output. It captures the incoming stream of 1-bit flags and writes them to a separate buffer in DDR RAM. Let's call this the FlagBuffer_FrameN.

Step 3: The "Merge" or "Correlation" Step (The Magic)

Once the DMA transfer is complete, the processor has two arrays in memory:

  1. OriginalPointCloud_FrameN: Contains the full (X, Y, Z, intensity, etc.) data for every point.
  2. FlagBuffer_FrameN: Contains the corresponding ground/non-ground flag for every point.

Now, the software can perform the actual segmentation by simply iterating through both arrays simultaneously.

Grid Bram

Size determined by the Grid Dimensions (eg 512x216 cells)