Summary of Java 21 Virtual Threads - Dude, Where’s My Lock?

  • netflixtechblog.com
  • Article
  • Summarized Content

    Introduction

    • Netflix has been using Java as the primary programming language for its vast fleet of microservices.
    • With Java 21, Netflix adopted virtual threads and the ZGC garbage collector to improve performance and ergonomics.
    • Virtual threads aim to simplify writing high-throughput concurrent applications by leveraging continuations and automatic suspension/resumption.

    The Problem

    Netflix engineers reported intermittent timeouts and hung instances in applications running on Java 21 with SpringBoot 3 and embedded Tomcat. The affected instances stopped serving traffic while the JVM remained running, with a persistent increase in the number of sockets in the closeWait state.

    Diagnostic Approach

    • Leveraged the alerting system to catch instances in the hung state.
    • Collected thread dumps, but they showed an idle JVM due to virtual thread call stacks not being visible.
    • Used jcmd Thread.dump_to_file to obtain complete thread dumps with virtual thread information.
    • Collected heap dumps as a last resort for introspecting the JVM state.

    Analysis

    • Thread dumps revealed thousands of "blank" virtual threads, approximately the same number as sockets in closeWait state.
    • Understood the virtual thread execution model:
      • Virtual threads are multiplexed onto a limited pool of OS threads (carrier threads).
      • Virtual threads are "pinned" to a carrier thread if they enter a synchronized block while blocking.
    • Identified 4 virtual threads pinned to carrier threads, waiting to acquire the same lock.
    • Discovered another virtual thread and a regular thread also waiting for the same lock.

    Inspecting the Lock

    • Used Eclipse MAT to inspect the heap dump and find the lock object.
    • Reverse-engineered the AbstractQueuedSynchronizer code to understand the lock state.
    • Found the lock in a transient state where one thread had released it, but the next thread had not yet acquired it.
    • Identified the next virtual thread (#119516) that was signaled to acquire the lock but could not proceed due to lack of available carrier threads.

    The Deadlock

    • The 4 pinned virtual threads could not proceed until acquiring the lock, effectively occupying all carrier threads in the fork-join pool.
    • The signaled virtual thread (#119516) could not run because there were no available carrier threads.
    • This deadlock-like situation involved one lock and a semaphore (fork-join pool) with 4 permits.
    • Provided a reproducible test case for the issue.

    Conclusion

    While virtual threads in Java 21 promise improved performance, there are still some integration challenges with locking primitives. Netflix looks forward to Java 23 and beyond, which should address these issues, enabling further performance gains through virtual thread adoption.

    Ask anything...

    Sign Up Free to ask questions about anything you want to learn.