I'm a little embarrassed to say it, but if I have one area of expertise it's probably the Internet checksum! I mean compared to other topics like neural networks, 6G, or quantum computing, talking about checksums seems downright quaint. After all, the checksum is ancient, having been around since Day One of the Internet, and compared to modern algorithms like CRC it's quite weak. One would think after forty years it'd be a solved problem or we would have moved on to something better... but, alas, things didn't pan out that way! Today nearly every packet sent on the Internet contains a checksum and it still can be a pain in the "you know what" to implement correctly-- do it right and we get efficient and reliable communications, do it wrong and we can get miserable performance or even packet loss.
Huhhm... come to think of it, maybe the Internet checksum isn't so quaint afterall! :-)
What's the deal
The Internet checksum is an integrity check to detect errors in packets sent on a network. It's used by several of the most common Internet protocols including IPv4, TCP, UDP, and a handful of other protocols as well (see RFC1122).
The algorithm for the checksum is so simple that I was able to explain it to my six year old daughter a while back :-). Suppose we have a list of numbers that we want to send to our friend and we want a way they can verify that the list didn't change in transit. We, the sender, start by adding up all the numbers in the list. We take the negative of the sum and tack that value onto the list of numbers. The whole list is sent to our friend, the receiver. The receiver simply adds up all the numbers in the list and then checks if the sum is equal to zero (hence the term checksum). If the sum is zero then it's all good, if not then there's an error in at least one of the numbers (but we can't tell which one :-( ).
We can describe the algorithm in formal notation. The sender does:
And to validate the checksum the receiver does:
For the standard Internet checksum, it's the half words (i.e. byte pairs or every sixteen bits) of a packet that are summed. If there's an odd number of bytes then a zero byte is logically appended. In the above notation, each xi is a halfword in the data covered by the checksum. xcsum is a special field in the data called the checksum field; and is set to the negated sum of the rest of the half words. IPv4, TCP, and UDP headers all have a checksum field.
The Internet checksum employs ones' complement arithmetic as denoted by the circles in the addition and sum operators above. This is in contrast to the standard twos' complement used for normal integer arithmetic. In ones' complement addition, if a carry is generated then it is added back into the result (adds a value of one). Like normal addition, ones' complement addition is both associative and commutative which isn't true for more sophisticated integrity algorithms like CRC or HMAC-- this is a critical point, we use these properties all over the place to speed up checksum computation. Also, in ones' complement addition negating a value is equivalent to taking the boolean "not" of it which we signify by ~. Byte ordering, that is endianness, does not matter with the arithmetic, we will get the right answer whether the machine is big endian or little endian.
Ones' complement addition has a quirky property in that there are two zeroes: 0 and -0. Both of these satisfy the identity property for addition. -0 is equivalent to ~0, so for the sixteen bit Internet checksum 0x0 and 0xFFFF are mathematically equivalent to zero. You might see them used somewhat interchangeably. For instance, when validating a checksum we could eliminate the "not" of the final sum and just compare to 0xFFFF (saves a CPU instruction). The two zeroes also allow an optional IPv4 UDP checksum. A sender may set the UDP checksum field to 0x0 to indicate the checksum doesn't need to be validated by the receiver. If a sender is setting the UDP checksum and the computation would set 0x0 in the checksum field then 0xFFFF is set instead to avoid ambiguity.
Computing the checksum in the CPU
In the beginning of the Internet, all checksums were computed in the CPU. This entails a loop that performs a ones' complement summation over the half-words of the checksum area. There are a few ways to optimize the calculation.
The arithmetic properties of checksum allow us to use larger operands (word sizes in CPU parlance). For instance, we can sum the words (four bytes), double words (eight bytes), and the checksum area and so on-- this substantially reduces the number of add operations to improve performance. The final sum needs to be folded to a proper two byte checksum-- the process is outlined in the diagram below. The resultant checksum is equal to that had we properly summed up all the halfwords (a proof of that is left to the reader!).
Checksum fold: This shows the process for folding a sixty-four bit checksum value to sixteen bits. First we add the high and low order words of the sixty-four bit value to derive a thirty-two bit value, and then we add the high and low words of the thirty-two bit value to derive the final sixteen bit value.
The normal add instruction in a CPU architecture does a twos' complement sum. It can be used to implement a ones' complement sum, however we can't use the maximum machine word size (for instance, on a sixty-four bit architecture we'd only be able to at most add thirty-two bits at a time). The problem is that when we add two full size words we lose the carry. Many CPU architectures have an add-with-carry instruction that adds two operands and also adds a value of 1 if the carry-bit is set. The carry-bit is a processor flag that is set whenever an add instruction generates a carry. Add-with-carry instructions can be used to efficiently perform checksum computation. For instance, in x86 with a 32 bit word size the adcl instruction can be used to efficiently compute a twenty byte IPv4 header checksum:
If you're interested in how add-with-carry instructions are used, take a look at x86 or ARM Linux kernel header files. Also, note that RISC-V doesn't have a carry bit so there's no add-with-carry instruction and therefore checksum computation in RISC-V is more expensive than in ARM or x86. Since RISC-V is an open ISA we're able to add specialized instructions for computing the Internet checksum (details are a topic for another day!).
Checksum offload
Using CPU instructions works well for checksumming small areas like the IPv4 header checksum, however it's really expensive to use when we're checksumming large blocks of memory like for TCP or UDP checksums that cover payloads. For that, we want checksum offload in NICs. See here for more specific descriptions of checksum offload in Linux.
There are two methods used in checksum offload: protocol agnostic and protocol specific. In protocol agnostic checksum offload the hardware doesn't need to know or parse the headers in a packet, and in protocol specific checksum offload it does. I'm not going say much about protocol specific checksum offload other than making a Public Service Announcement:
Dear NIC vendors, please, please, please implement protocol agnostic checksum offload if you haven't already. It's far more extensible, more flexible, more robust, and better for the Internet (for more information, see Dave Miller's presentation).
In the early days, packets sent on the Internet were very simple. We saw a lot of plain TCP and UDP packets that contained at most one payload checksum that was easy to find. But times have changed. We now have to deal with much more complex packets, and because of encapsulation we often see two or more payload checksums in the same packet (these complex packets also can make protocol specific checksum offload fail). Fortunately, there are some nice techniques to "offload" multiple checksums in the same packet without requiring support for hardware to offload more than one checksum.
Checksum offload: The left side shows the processing for Transmit checksum offload and the right shows Receive checksum offload.
Transmit checksum offload
Transmit checksum offload is simple. The host stack tells the device what do through two variables:
csum_start: The offset from the start of the packet where the area to checksum starts. In the case of a TCP packet this would be the offset of the TCP header
csum_offset: The offset of the checksum field to write. In the case of TCP this would be the offset of the checksum field in the header
The networking stack sends the csum_start and csum_offset values to the hardware device in a transmit descriptor. The device operation is:
Perform a ones' complement sum of the half words in the packet starting from the packet offset specified in csum_start through the last byte of the packet
Write the two byte result from step 1 into the packet at the offset specified by csum_offset
The networking stack initializes the checksum field so that everything works out. For instance, in the case of UDP or TCP the checksum computation includes summing over a pseudo header consisting of the IP addresses, length, and protocol number. The host stack computes the ones' complement sum over this pseudo header, negates the value, and sets it in the checksum field of the TCP or UDP header before offloading.
To support offloading of multiple transmit checksums per packet, a clever algorithm called Local Checksum Offload is used. This exploits the fact that the one's complement sum over a checksum area can be deduced even before the checksum field is set. For instance, for TCP the final sum over the TCP header and payload must equal to the "not" of the summed pseudo header. Applying this information we can offload the innermost checksum and the CPU can directly set the outer checksum fields efficiently without additional hardware support.
Transmit checksum offload with two checksums: This shows offloading the inner TCP checksum in a packet and using LCO to derive the outer UDP checksum for VLAN. The blue text and arrows show device operations.
Receive checksum offload
Receive checksum offload is even simpler than transmit checksum. All we need is for the device to compute the ones' complement sum over a packet, from the first byte of the Ethernet payload to the last byte, and then convey the sum to the host stack in the receive descriptor for the packet. This checksum over the packet is enough for the stack to compute any number of payload checksums in the packet. This is illustrated in the diagram below.
Receive checksum offload with two checksums: This shows how both an outer UDP checksum and inner TCP checksum in a packet can be validated based on the packet checksum value provided by a device. The blue text and arrows show device operations. csumudp and csumtcp are needed to compute the UDP and TCP checksums.
Incremental checksum updates
One of the most powerful features of the Internet checksum is incremental updates. It is very common as a packet is forwarded in the network that small modifications are made that require a checksum to be recalculated. For example, when forwarding an IPv4 packet the TTL field in the IP header is decremented and the TTL is covered by the IPv4 header checksum so the checksum has to be recalculated. NAT is another good example, when we modify IP addresses and port numbers the TCP or UDP checksum needs to be updated. The arithmetic properties allow the checksum to be updated without needing to recompute the whole thing. For instance, if a sixteen bit field in a packet changes, the checksum field can be updated by:
Checksum and segmentation offloads
Checksum offload is naturally part of segmentation offloads. When a device is doing TSO it needs to compute the TCP or UDP checksum in each created packet. The transmit checksum offload procedures described above still apply where csum_start and csum_offset are used to set the checksum in each created packet (assuming the length of all segments is the same). In LRO, we need to validate the TCP or UDP checksum before reassembling the packet; this is one variant of checksum offload that can't be done by protocol agnostic methods (note LRO itself is already a protocol specific offload).
When using segmentation offloads with IPv4, the IPv4 header checksum needs to be considered. For TSO the header checksum will be set by the host stack that covers the header of the unsegmented packet; in the created packets the IPv4 length field and IPID may change and these can be processed as incremental checksum updates. When doing LRO, the header checksum in each received packet needs to be verified before reassembling.
Commenti