Summary of Large sequence models for software development activities

  • blog.research.google
  • Article
  • Summarized Content

    DIDACT: A Machine Learning Approach for Software Development Assistance

    Google researchers have developed DIDACT (Dynamic Integrated Developer ACTivity), a methodology for training machine learning models to assist software developers. DIDACT leverages the entire software development process as training data, including code edits, code reviews, interactions with development tools, and more.

    • DIDACT exposes machine learning models to the real-world contexts and activities that developers encounter, enabling the models to learn about the dynamics of software development.
    • Google's monorepo and instrumented software engineering toolchains provide a rich source of data on developer activities, allowing the scaling up of training data diversity and quantity.

    Multi-Task Modeling for Developer Assistance

    DIDACT defines various tasks related to individual developer activities, such as repairing broken builds, predicting or addressing code review comments, renaming variables, and editing files. These tasks are represented using a common formalism:

    • State: The code file or context
    • Intent: Annotations specific to the activity (e.g., code review comments, compiler errors)
    • Action: The operation taken to address the task (e.g., edits, comments, renames), expressed using a mini programming language called DevScript

    Integrating Machine Learning into Developer Workflows

    Google has deployed three DIDACT tools internally, integrated into different stages of the development workflow:

    • Comment Resolution: Suggests edits to address code review comments
    • Build Repair: Helps fix broken builds
    • Tip Prediction: Provides code completion suggestions

    These tools have received enthusiastic feedback from thousands of professional developers at Google, indicating their usefulness in improving developer productivity.

    Emergent Capabilities: History-Augmented Code Editing

    DIDACT exhibits surprising capabilities, enabled by its multimodal nature and the use of developer activity history. One such capability is history-augmented code completion, where the model can complete code snippets based on the developer's recent edits, anticipating their next steps.

    • Edit prediction allows the model to choose where to edit next in a historically consistent manner, updating related parts of the code across files.
    • Given a blank file, the model can iteratively predict edits to create a fully functional code file, mimicking a developer's step-by-step coding process.

    Towards General-Purpose AI Assistants for Software Development

    DIDACT demonstrates the potential for developing general-purpose AI assistants that can aid developers across the entire software development process. By leveraging machine learning models trained on real-world developer activities, DIDACT paves the way for AI systems that can collaborate with human developers, enhancing productivity and code quality.

    Benefits and Impact

    The DIDACT approach complements the advancements in large language models, offering tools that:

    • Ease the toil of software development tasks
    • Improve developer productivity
    • Enhance the quality of software engineering work

    Acknowledgments and Collaborators

    The DIDACT project is a multi-year collaboration among Google Research, Google Core Systems and Experiences, and DeepMind, involving numerous researchers, engineers, and leaders across Alphabet.

    Ask anything...

    Sign Up Free to ask questions about anything you want to learn.