Summary of Introducing Apple’s On-Device and Server Foundation Models

  • machinelearning.apple.com
  • Article
  • Summarized Content

    Introducing Apple Intelligence: AI for Everyday Tasks at WWDC24

    At WWDC24, Apple unveiled Apple Intelligence, a new personal intelligence system integrated into iOS 18, iPadOS 18, and macOS Sequoia. Apple Intelligence is designed to empower users with intelligent tools for everyday tasks, leveraging advanced generative models that adapt on the fly to the user's current activity. The foundation models at the heart of Apple Intelligence are fine-tuned for diverse user experiences, including:

    • Writing and refining text
    • Prioritizing and summarizing notifications
    • Creating playful images for conversations
    • Taking in-app actions for simplified interactions

    Apple's Focus on Responsible AI Development

    Apple Intelligence is built upon a foundation of groundbreaking privacy innovations and adheres to Apple's core values. The development process is guided by a set of Responsible AI principles:

    • Empower users with intelligent tools: Apple aims to create tools that address specific user needs while respecting their choices in using these tools.
    • Represent users authentically: Apple strives to avoid perpetuating stereotypes and biases in its AI tools and models, ensuring they represent users globally.
    • Design with care: Apple takes precautions at every stage of development to identify potential misuse or harm of its AI tools and proactively improves them based on user feedback.
    • Protect privacy: Apple prioritizes user privacy through on-device processing and technologies like Private Cloud Compute. User data is never used for training foundation models.

    Apple's Foundation Models: Powering Apple Intelligence

    Apple Intelligence relies on two primary foundation models:

    • On-device language model (~3 billion parameters): This model is optimized for speed and efficiency, enabling fast responses directly on Apple devices.
    • Server-based language model: This larger model, accessed through Private Cloud Compute and powered by Apple silicon servers, provides enhanced capabilities for complex tasks.

    These foundation models are part of a broader family of generative models developed by Apple, including:

    • A coding model for intelligence in Xcode
    • A diffusion model for visual expression in apps like Messages

    Pre-Training: Building the Foundation

    Apple's foundation models are trained using the open-source AXLearn framework, leveraging JAX and XLA for efficiency and scalability. Training is conducted across various hardware platforms, including TPUs and GPUs, employing data parallelism, tensor parallelism, sequence parallelism, and FSDP to optimize the process. The training data consists of:

    • Licensed data: Selected for enhancing specific features
    • Publicly available data: Collected by AppleBot, with an opt-out option for web publishers

    Apple emphasizes that user data is never used for training and applies filters to remove sensitive information from publicly available data.

    Post-Training: Refining Model Capabilities

    Apple employs a hybrid data strategy incorporating human-annotated and synthetic data, along with rigorous data curation and filtering procedures. Two novel algorithms enhance the model's instruction-following abilities:

    • Rejection sampling fine-tuning algorithm with teacher committee
    • Reinforcement learning from human feedback (RLHF) algorithm with mirror descent policy optimization and a leave-one-out advantage estimator

    Optimization: Balancing Performance and Efficiency

    Apple has implemented various techniques to optimize its models for speed and efficiency, both on-device and in the private cloud. Key optimizations include:

    • Grouped-query-attention for both on-device and server models
    • Shared input and output vocab embedding tables to reduce memory requirements
    • Low-bit palletization for on-device inference, achieving a 3.5 bits-per-weight configuration for accuracy
    • Talaria, an interactive model latency and power analysis tool, for guiding bit rate selection
    • Activation quantization and embedding quantization
    • Efficient Key-Value (KV) cache update on neural engines

    These optimizations result in remarkable performance on iPhone 15 Pro, achieving a time-to-first-token latency of about 0.6 milliseconds per prompt token and a generation rate of 30 tokens per second.

    Model Adaptation: Tailoring to User Needs

    Apple Intelligence leverages adapters, small neural network modules, to fine-tune its foundation models for specific tasks. These adapters dynamically specialize the models on-the-fly, adapting the attention matrices, attention projection matrix, and fully connected layers of the transformer architecture. This approach preserves the general knowledge of the base model while tailoring it to specific tasks. The adapter parameters are represented using 16 bits, with a typical size of 10s of megabytes for the on-device model.

    Performance and Evaluation: Measuring Success

    Apple emphasizes human evaluation in benchmarking its models, as it closely correlates with user experience. Performance evaluations are conducted on both feature-specific adapters and foundation models. Apple's approach to performance evaluation is illustrated by the summarization adapter:

    • Adapters are fine-tuned to meet specific product requirements for email and notification summaries.
    • Training data is based on synthetic summaries generated from server models, filtered through rejection sampling for quality.
    • Evaluation datasets consist of 750 carefully sampled responses per use case, emphasizing diverse inputs and real-world scenarios.

    Apple's models with adapters demonstrate better summarization capabilities compared to a comparable model. However, the company acknowledges potential risks inherent in summarization, such as removing nuances or details. Extensive adversarial probing and continuous evaluation are conducted to mitigate potential harm.

    Beyond feature-specific evaluations, Apple assesses the general capabilities of both on-device and server-based models using a comprehensive set of real-world prompts. The models are compared to open-source and commercial models of comparable size, demonstrating superior performance across various tasks, including brainstorming, classification, question answering, coding, and writing. Notably, Apple's on-device model outperforms larger models, while the server model compares favorably to commercial counterparts.

    Apple also emphasizes safety and robustness by evaluating model performance on harmful content, sensitive topics, and factuality through adversarial prompts. The models consistently demonstrate lower violation rates compared to open-source and commercial models, highlighting their resilience against harmful inputs.

    In addition to human evaluations, Apple utilizes benchmarks like Instruction-Following Eval (IFEval) to assess instruction-following capabilities. The results indicate that Apple's models excel at following detailed instructions compared to models of similar size.

    Conclusion: A Vision for Personal Intelligence

    Apple's foundation models and adapters underpin Apple Intelligence, a personal intelligence system deeply integrated into iPhone, iPad, and Mac. The system empowers users with advanced capabilities across language, images, actions, and personal context. Apple emphasizes responsible AI development, guided by its core values and a commitment to privacy. The company plans to share more information about its broader family of generative models in the future, expanding the possibilities of personal intelligence.

    Ask anything...

    Sign Up Free to ask questions about anything you want to learn.