Collateral Evolutions

The Linux operating system (OS) is evolving rapidly to improve performance and to provide new features. This evolution, however, makes it difficult to maintain platform-specific code such as device drivers. Indeed, an evolution in a driver support library often triggers the need for multiple collateral evolutions in dependent device drivers, to bring the drivers up to date with the new library API. Currently, collateral evolutions are mostly done manually. The large number of drivers, however, implies that this approach is time-consuming and unreliable, leading to subtle errors when modifications are not done consistently. Moreover, as these collateral evolutions are often poorly documented, the resulting maintenance is difficult and costly, frequently introducing errors. If a driver maintainer becomes unavailable, the driver quickly falls behind the rest of the OS.

We propose a language-based approach to address the problem of collateral evolution in drivers. The key idea of Coccinelle is to shift the burden of collateral evolution from the driver maintainer to the OS developer who performs the original OS evolution, and who thus understands this evolution best. In our vision, the OS developer first uses the Coccinelle transformation language to write a semantic patch describing the required collateral evolution in device drivers and then uses the Coccinelle transformation tool to validate the semantic patch on the drivers in the Linux source distribution. When he has confidence in the correctness of the semantic patch, he distributes it for use by the maintainers of other drivers. Overall, Coccinelle will provide a means for formally documenting collateral evolutions and for easing the application of these evolutions to driver code.

Collateral evolutions are derived from evolutions in the interface between driver support libraries (and the kernel) with device-specific code. Examples of such evolutions include adding an argument to a function or changing the sequence of library functions that the device-specific code should use for a given purpose. While the change in the interface may be small, the collateral evolution in device-specific code can be subtle and complex, potentially affecting a hundred or more files.

Adding an argument to usb_submit_urb

The USB library function usb_submit_urb implements the passing of a message, implemented as USB Request Block (urb). This function uses the kernel memory-allocation function, kmalloc, which must be passed a flag indicating the circumstances in which blocking is allowed. Up through Linux 2.5.3, the flag was chosen in the implementation of usb_submit_urb using the function in_interrupt() to check the status of interrupts, as blocking is not allowed when interrupts are disabled. Starting in Linux 2.5.4, usb_submit_urb it was found that this solution was unsatisfactory and that the caller should instead provide information about the calling context explicitly. Accordingly, an extra argument was added to usb_submit_urb, which should have one of the following values: GFP_KERNEL (no constraints), GFP_ATOMIC (blocking not allowed), or GFP_NOIO (blocking allowed but not I/O).

With this change to the interface, the programmer of device-specific code must extend each call to usb_submit_urb with GFP_KERNEL, GFP_ATOMIC, or GFP_NOIO. Some guidance is given in the comments associated with the definition of usb_submit_urb. For example, GFP_ATOMIC is required when locks are held or in a completion handler. The former case is illustrated by the following example, extracted from drivers/usb/audio.c (later named drivers/usb/class/audio.c):
spin_lock_irqsave(&as->lock, flags);
if (!usbin_retire_desc(u, urb) &&
    u->flags & FLG_RUNNING &&
    !usbin_prepare_desc(u, urb) &&
    (suret = usb_submit_urb(urb)) == 0) {
  u->flags |= mask;
} else {
  u->flags &= ~(mask | FLG_RUNNING);
  wake_up(&u->dma.wait);
  printk(KERN_DEBUG "...", suret);
}
spin_unlock_irqrestore(&as->lock, flags);
becomes
spin_lock_irqsave(&as->lock, flags);
if (!usbin_retire_desc(u, urb) &&
    u->flags & FLG_RUNNING &&
    !usbin_prepare_desc(u, urb) &&
    (suret = usb_submit_urb(urb,GFP_ATOMIC)) == 0) {
  u->flags |= mask;
} else {
  u->flags &= ~(mask | FLG_RUNNING);
  wake_up(&u->dma.wait);
  printk(KERN_DEBUG "...", suret);
}
spin_unlock_irqrestore(&as->lock, flags);
Nevertheless, detecting the holding of locks requires a careful and occasionally interprocedural analysis of the source code, and the other conditions, such as "in a completion handler", are not formally defined and require study of multiple files.

Due to the complexity of the conditions governing the choice of new argument for usb_submit_urb, 71 of the 158 calls to this function were initially transformed incorrectly to use GFP_KERNEL instead of GFP_ATOMIC. The graph below categorizes the reasons why GFP_ATOMIC was required in these cases and the versions in which the error was corrected. Linux 2.6.13 still contains an incorrect call to usb_submit_urb in drivers/usb/class/audio.c.

Changing the driver initialization protocol

The function check_region is used in the initialization of device drivers, in determining whether a given device is installed. In early versions of Linux, the kernel initializes device drivers sequentially. In this case, a driver determines whether its device is attached to a given port using the following protocol: (i) call check_region to find out whether the memory region associated with the port is already allocated to another driver, (ii) if not, then perform some driver-specific tests to identify the device attached to the port, and (iii) if the desired device is found, then call request_region to reserve the memory region for the current driver.

In more recent versions of Linux, the kernel initializes device drivers concurrently. In this case, between the call to check_region and the call to request_region some other driver may claim the same memory region and initialize the device. To solve this problem, starting with Linux 2.4.2, device-specific code began to be rewritten to replace the call to check_region in step (i) with a call to request_region, to actually reserve the memory region. Given this change, if in step (ii) the expected device is not found, then release_region must be used to release the memory region. The transformation is illustrated by the modifications to drivers/char/logibusmouse.c between Linux 2.4.18 and 2.4.19:
static int __init logi_busmouse_init(void) {
  if (check_region(LOGIBM_BASE, LOGIBM_EXTENT))
    return -EIO;
  
  outb(MSE_CONFIG_BYTE, MSE_CONFIG_PORT);
  outb(MSE_SIGNATURE_BYTE, MSE_SIGNATURE_PORT);
  udelay(100L);   /* wait for reply from mouse */
  if (inb(MSE_SIGNATURE_PORT) != MSE_SIGNATURE_BYTE)
    return -EIO;
  ...
  request_region(LOGIBM_BASE, LOGIBM_EXTENT, "busmouse");
  msedev = register_busmouse(&busmouse);
  if (msedev < 0) ...
  else printk(KERN_INFO "Logitech busmouse installed.\n");
  return msedev < 0 ? msedev : 0;
}
becomes
static int __init logi_busmouse_init(void) {
  if (!request_region(LOGIBM_BASE, LOGIBM_EXTENT, "busmouse"))
    return -EIO;
  
  outb(MSE_CONFIG_BYTE, MSE_CONFIG_PORT);
  outb(MSE_SIGNATURE_BYTE, MSE_SIGNATURE_PORT);
  udelay(100L);   /* wait for reply from mouse */
  if (inb(MSE_SIGNATURE_PORT) != MSE_SIGNATURE_BYTE) {
    release_region(LOGIBM_BASE, LOGIBM_EXTENT);
    return -EIO;
  }
  ...
  msedev = register_busmouse(&busmouse);
  if (msedev < 0) { ... }
  else printk(KERN_INFO "Logitech busmouse installed.\n");
  return msedev < 0 ? msedev : 0;
}

Both steps in eliminating check_region are difficult and time-consuming. This difficulty has lead to the slow pace of the evolution, as shown below. Although beginning in Linux 2.4.2, released in February 2001, the evolution is still not complete as of Linux 2.6.13.3, released in October 2005.