ANR-NRF ITrans project (2017-2021)

Overview

Large, real-world software must continually change, to keep up with evolving requirements, fix bugs, and improve performance, maintainability, and security. This rate of change can pose difficulties for clients, whose code cannot always evolve at the same rate. This project will target the problems of forward porting, where one software component has to catch up to a code base with which it needs to interact, and back porting, in which it is desired to use a more modern component in a context where it is necessary to continue to use a legacy code base. To understand and illustrate both problems, we will focus on infrastructure software, i.e., software such as operating systems and language runtimes that underlie all computing. As our main motivating example, we will take the Linux kernel, which supports computing environments ranging from embedded systems to clouds and supercomputers. The Linux kernel is fast evolving and thus raises real challenges for users who need to use code designed for one version in an earlier or later one.

Prior work on code porting have taken a recommendation-based approach: by observing changes between the original version and the target version, they recommend a series of method calls to replace the existing implementation of a functionality. Such approaches, however, only half address the problem: they do not help the user construct the other computations, such as tests, data structure manipulations, etc., that are essential to obtain working code. In this project, we will instead realize a history-guided source-code transformation-based approach, which automatically traverses the history of the changes made to a software system, to find where changes in the code to be ported are required, gathers examples of the required changes, and generates change rules to incrementally back port or forward port the code. We will build on existing works on automatic inference of change rules, improving their genericity and scalability, to enable comprehensively inferring and automating all of the changes required to back or forward port code between versions. Our approach will be a success if it is able to automatically back and forward port a large number of drivers for the Linux operating system to various earlier and later versions of the Linux kernel with high accuracy while requiring minimal developer effort. This objective is not achievable by existing techniques.

This project represents a 3-year collaboration between researchers at Inria (Whisper team) and at the School of Information Systems at Singapore Management University (SMU). The Inria researchers are world leaders in the design of tools for supporting the development of infrastructure software, including Coccinelle, which is regularly used today in Linux kernel development. The SMU researchers are world leaders in software mining techniques and have developed many techniques that analyze program history to automate software tasks.

The success of this project will benefit the software engineering research community, the developer, and the general public. For the software engineering research community, this project will improve the understanding of the kinds of changes that occur between versions in infrastructure software, and potentially motivate the design of new kinds of tools. For the developer, this project will ease and improve the reliability of the common task of porting between versions, freeing up resources for improving the code quality and adding new functionalities. The project will also raise awareness of how code changes impact the ability to back and forward port. For the general public, this project will help ensure that bug fixes for critical infrastructure software code are available immediately, even to users of older versions, reducing vulnerability to attacks. Our approach will also allow users running an older version of infrastructure software to benefit from support for the latest hardware and applications.

The partners on this project are:

  • Sorbonne University: Julia Lawall (coordinator), Gilles Muller
  • Singapore Mangement University: David Lo (coordinator), Lingxiao Jiang
  • Results on automatic transformation rule inference

  • AndroEvolve: Automated Update for Android Deprecated-API Usages (Demo)
    Stefanus Agus Haryono, Ferdian Thung, David Lo, Lingxiao Jiang, Julia Lawall, Hong Jin Kang, Lucas Serrano, Gilles Muller.
    ICSE (Demo track) 2021
  • Automatic Inference of System Software Transformation Rules from Examples. (Inférence automatique à partir d'exemples de règles de transformations logicielles).
    Lucas Serrano.
    PhD thesis, Sorbonne University, France, 2020
  • SPINFER: Inferring Semantic Patches for the Linux Kernel.
    Lucas Serrano, Van-Anh Nguyen, Ferdian Thung, Lingxiao Jiang, David Lo, Julia Lawall, Gilles Muller.
    USENIX Annual Technical Conference 2020: 235-248
  • Automatic Android Deprecated-API Usage Update by Learning from Single Updated Example.
    Stefanus A. Haryono, Ferdian Thung, Hong Jin Kang, Lucas Serrano, Gilles Muller, Julia Lawall, David Lo, Lingxiao Jiang.
    ICPC (ERA track) 2020: 401-405
  • Automated Deprecated-API Usage Update for Android Apps: How Far are We?
    Ferdian Thung, Stefanus A. Haryono, Lucas Serrano, Gilles Muller, Julia Lawall, David Lo, Lingxiao Jiang.
    SANER (RENE track) 2020: 602-611
  • AUSearch: Accurate API Usage Search in GitHub Repositories with Type Resolution.
    Muhammad Hilmi Asyrofi, Ferdian Thung, David Lo, Lingxiao Jiang.
    SANER (Tool demo) 2020: 637-641
  • Towards Generating Transformation Rules without Examples for Android API Replacement.
    Ferdian Thung, Hong Jin Kang, Lingxiao Jiang, David Lo.
    ICSME (short paper) 2019: 213-217
  • Fast and Precise Retrieval of Forward and Back Porting Information for Linux Device Drivers.
    Julia Lawall, Derek Palinski, Lukas Gnirke, Gilles Muller.
    USENIX Annual Technical Conference 2017: 15-26
  • Results on semantic patches for Java

  • Semantic Patches for Java Program Transformation (Artifact).
    Hong Jin Kang, Ferdian Thung, Julia Lawall, Gilles Muller, Lingxiao Jiang, David Lo.
    Dagstuhl Artifacts Ser. 5(2): 10:1-10:3 (2019)
  • Semantic Patches for Java Program Transformation (Experience Report).
    Hong Jin Kang, Ferdian Thung, Julia Lawall, Gilles Muller, Lingxiao Jiang, David Lo.
    ECOOP 2019: 22:1-22:27
  • Results on automatic detection of bug-fixing patches

  • CC2Vec: distributed representations of code changes.
    Thong Hoang, Hong Jin Kang, David Lo, Julia Lawall.
    ICSE 2020: 518-529
  • PatchNet: Hierarchical Deep Learning-Based Stable Patch Identification for the Linux Kernel.
    Thong Hoang, Julia Lawall, Yuan Tian, Richard Jayadi Oentaryo, David Lo.
    IEEE TSE, 2019 (early access)
  • PatchNet: a tool for deep patch classification.
    Thong Hoang, Julia Lawall, Richard Jayadi Oentaryo, Yuan Tian, David Lo
    ICSE (Companion Volume) 2019: 83-86
  • Developed tools

  • Spinfer: Semantic patch inference from examples.
  • Coccinelle4J: A lightweight port of Coccinelle for use on Java code. Coccinelle4J is in the branch java.
  • AUSearch: Find API usage examples in Java source code found on GitHub.
  • Prequel: A query language for finding commits to C projects in a git repository. There is also some support for Java, ia Coccinelle4J.
  • PatchNet: Identification of bug-fixing patches using machine learning.
  • Workshop

    On December 7 2019, we organized a workshop at Singapore management University, with Yingfei Xiong of Peking University and Xuan-Bach Le of the University of Melbourne as invited speakers, in addition to a number of speakers in the area. (Unfortunately, Yingfei Xiong was not able to attend).