top of page
Search
Writer's pictureTom Herbert

A Threading Model for Extremely Extreme Parallelism

Updated: Aug 14

Tom Herbert, SiPanda CTO, July 29, 2024.

Previously, we discussed techniques for fine grained parallelism in serial data processing, including horizontal parallelism for processing packets in parallel and vertical parallelism for processing protocol layers of the same packet in parallel.  This week we look at the threading model for our model of parallel processing.

Threads and thread sets and datapaths– OH MY!!!

The SiPanda threading model is composed of the three elements: threads, thread sets and datapaths.


A thread is the smallest set of programmed instructions that can be independently scheduled to execute in a CPU; in this architecture, threads are run to completion and should have minimal context switch overhead. A thread set is a set of threads created to process a data object or packet; all of the threads in a thread set are ordered and are maintained in an ordered list. A datapath is a collection of thread sets for processing packets; like threads in a thread set, thread sets themselves are well ordered and maintained in an ordered list in the datapath. A system may contain multiple datapaths, for instance each network receive interface might be associated with its own datapath.


Threads in a thread set process protocol layers of a packet in parallel for vertical parallelism, and thread sets in a datapath process packets in parallel for horizontal parallelism. The ordering properties of threads and thread sets are crucial for dependency synchronization as we talked about in the last blog (and we’ll talk more about the interaction between dependencies and thread scheduling in the next one). The figure below shows an example of the threading model in action where three thread sets are actively processing packets, and threads within each thread set are processing various protocol layers of the respective packets.

Threads, thread sets, and datapaths. This example shows a datapath with three thread sets processing packets. Threads within each thread set are scheduled to process protocol layers of the packet being processed by the thread set. Note that thread sets are in the ordered list of the datapath, and threads are in the ordered list of their thread set.

Threads, thread sets, and datapaths. This example shows a datapath with three thread sets processing packets. Threads within each thread set are scheduled to process protocol layers of the packet being processed by the thread set. Note that thread sets are in the ordered list of the datapath, and threads are in the ordered list of their thread set.


Thread Scheduling– Daddy, are we there yet???

Thread scheduling is pretty straightforward. Threads are scheduled using one of two methods: top function scheduling or cascade scheduling. These can be combined in hybrid scheduling. The diagram below illustrates an example of hybrid scheduling.


In top function scheduling, each thread set runs a top function to schedule threads. Typically, the top function invokes a parser to identify the constituent protocol layers of packets and then schedules threads to process the layers. For each identified protocol layer, the top function may schedule a thread by queuing a work item on the work queue of an available thread. A work item gives all the necessary information for running a thread including the function to run and any arguments. Parsing may be tightly integrated with scheduling as parser-scheduling, and the top function is then a parser-scheduler (we have a lot more to say about parsers in future blogs!).


In cascade scheduling, the last worker thread in the ordered list can schedule the next thread in a thread set. When the first thread runs it can schedule a second worker thread, the second thread may in turn schedule a third worker thread, and so on. The cascade of scheduling threads stops when a last worker thread doesn’t schedule a next thread. In cascading scheduling, only the last thread in the thread set’s order list may schedule a next thread.


Example of hybrid scheduling with top-function (parser-scheduling) and cascade scheduling. In this example, hybrid scheduling is used to schedule worker threads to process a TCP in IPv4 in IPsec in IPv6 in Ethernet packet; IPsec encrypts the encapsulated IPv4 packet. When a packet is input, the parser-scheduler parses the Ethernet, IPv6, and IPsec headers (those in plain text), and schedules worker threads to process them as indicated by the solid green arrows from the top function. IPsec processing decrypts the encapsulated IPv4 packet and schedules a thread to process the IPv4 header in cascade scheduling. The IPv4 thread then schedules a thread to process the TCP header. Cascade scheduling is indicated by the dashed red arrows.

Example of hybrid scheduling with top-function (parser-scheduling) and cascade scheduling. In this example, hybrid scheduling is used to schedule worker threads to process a TCP in IPv4 in IPsec in IPv6 in Ethernet packet; IPsec encrypts the encapsulated IPv4 packet. When a packet is input, the parser-scheduler parses the Ethernet, IPv6, and IPsec headers (those in plain text), and schedules worker threads to process them as indicated by the solid green arrows from the top function. IPsec processing decrypts the encapsulated IPv4 packet and schedules a thread to process the IPv4 header in cascade scheduling. The IPv4 thread then schedules a thread to process the TCP header. Cascade scheduling is indicated by the dashed red arrows.


Okay, so what’s the point?

This model has a number of advantages: 1) The division of threads into thread sets naturally supports the models of horizontal and vertical parallelism, 2) The ordered lists supports dependencies synchronization (in the next blog we’ll show how dependencies interact with the thread scheduling), 3) We can optimize the implementation for the unique characteristics and requirements of serial data processing, and in particular we can put these optimizations in specialized hardware such as using a hardware scheduler that runs orders of magnitude faster than a software scheduler like that in Linux. Is this generic and general? No! It’s a Domain Specific Architecture and that’s exactly the point!

SiPanda

SiPanda was created to rethink the network datapath and bring both flexibility and wire-speed performance at scale to networking infrastructure. The SiPanda architecture enables data center infrastructure operators and application architects to build solutions for cloud service providers to edge compute (5G) that don’t require the compromises inherent in today’s network solutions. For more information, please visit www.sipanda.io. If you want to find out more about PANDA, you can email us at panda@sipanda.io. IP described here is covered by patent USPTO 12,026,546 and other patents pending.


67 views0 comments
bottom of page