You can find the full code for this blog post on GitHub.

The Cold Start Problem

JVM applications have a cold start problem. In resource-constrained environments like Google Cloud Run, startup times often stretch to 1-5+ seconds depending on the framework and application size. For traditional long-running servers, this doesn’t matter much. For serverless environments where instances spin up and down constantly, and requests are ofte blocked by cold start times, it can have a massive impact on user experience.

GraalVM Native Image is one solution. It compiles your app to a native binary with near-instant startup. But the tradeoffs are significant: compile times can stretch to several minutes, debugging issues is painful, and you need buy-in from library authors for reflection configuration that we still don’t fully have across the ecosystem. Even after all that, native images typically achieve only ~80% of peak JVM throughput due to the loss of JIT optimizations. For many applications, that tradeoff isn’t worth it.

CRaC offers a different approach: keep the full JVM, but skip the startup.

What is CRaC?

CRaC (Coordinated Restore at Checkpoint) is an OpenJDK project that lets you snapshot a running JVM and restore it later. The idea is simple:

Start your application normally
Let it warm up (JIT compile hot paths, initialize connections, load caches)
Take a checkpoint—a snapshot of the entire JVM state
Later, restore from that checkpoint in milliseconds

Under the hood, CRaC uses CRIU (Checkpoint/Restore In Userspace), a Linux kernel feature that can freeze and restore processes. When you restore, you’re not starting a new JVM—you’re resuming one that was already running.

The key benefit: your restored application has all the JIT-compiled code from the original run. There’s no warmup period. The JVM is already hot.

JVM Support

CRaC isn’t available in standard OpenJDK builds. You need a distribution that includes it:

JVM	Supported Versions
Azul Zulu	17, 21, 22, 23
BellSoft Liberica	17, 21

I recommend Azul Zulu—it was the first to offer commercial CRaC support and has the most mature implementation.

Important: CRaC requires a real Linux machine. On macOS and Windows, you can run in “simulation mode” for development—the checkpoint/restore lifecycle executes, but no actual snapshot is created. Docker on Mac won’t help here either; CRIU needs direct access to Linux kernel features that aren’t available through Docker’s virtualization layer. For real checkpoints, you need native Linux.

How CRaC Works with Ktor

The challenge with checkpointing a server is that open network sockets can’t be serialized. CRIU will fail if your application has any open file descriptors or sockets at checkpoint time.

CRaC solves this with the org.crac.Resource interface. Your application registers resources that need to be notified before checkpoint and after restore:

class ServerCracResource(
    private val server: EmbeddedServer<*, *>,
) : Resource {

    override fun beforeCheckpoint(context: Context<out Resource>?) {
        // Close all sockets before checkpoint
        server.stop(gracePeriodMillis = 0, timeoutMillis = 0)
    }

    override fun afterRestore(context: Context<out Resource>?) {
        // Restart the server after restore
        server.start(wait = false)
    }

    companion object {
        fun register(server: EmbeddedServer<*, *>) {
            Core.getGlobalContext().register(ServerCracResource(server))
        }
    }
}

Before checkpoint, we stop the Netty server (closing all sockets). After restore, we restart it. The JVM state—including all your application’s initialized objects, caches, and JIT-compiled code—survives the checkpoint.

Implementation

Here’s a minimal Ktor application with CRaC support:

fun main(args: Array<String>) {
    val shouldCheckpoint = args.contains("--checkpoint")

    val server = embeddedServer(Netty, port = 8080) {
        routing {
            get("/") { call.respondText("Hello, CRaC!") }
            get("/health") { call.respondText("OK") }
        }
    }

    ServerCracResource.register(server)
    server.start(wait = false)

    if (shouldCheckpoint) {
        // Trigger checkpoint programmatically
        Core.checkpointRestore()
    }

    // Keep the application running
    runBlocking { awaitCancellation() }
}

The --checkpoint flag tells the application to checkpoint itself after starting. When Core.checkpointRestore() is called, the JVM snapshots itself and exits. The next time you restore from that checkpoint, execution continues right after the checkpointRestore() call.

Running It

CRaC requires specific JVM flags:

Create a checkpoint:

java -XX:CRaCCheckpointTo=./checkpoint -jar app.jar --checkpoint

This starts the app, waits for it to be ready, takes a checkpoint to ./checkpoint/, and exits.

Restore from checkpoint:

java -XX:CRaCRestoreFrom=./checkpoint -jar app.jar

This resumes the JVM from the checkpoint. The application is serving requests within milliseconds.

Back of Napkin Benchmark Results

For a simple Ktor + Netty application:

Scenario	Startup Time	Improvement
Normal startup	352ms	baseline
Restore from checkpoint	26ms	14x faster

The improvement scales with application complexity. Larger applications with more dependencies see even more dramatic improvements.

When to Use CRaC

CRaC is ideal for serverless and FaaS environments where cold starts directly impact user experience. That said, there are tradeoffs to consider: CRaC requires native Linux (Docker on Mac/Windows won’t work since CRIU needs real kernel access), checkpoint files can be large (hundreds of MB), and most importantly, developers need to be aware of what server “resources” can and cannot survive process checkpoint. Although, from experience, in most cases, once you get it setup you don’t have to think about it too much.

You can find the full code for this blog post on GitHub.

Fast JVM Startup with CRaC and Ktor

The Cold Start Problem

What is CRaC?

JVM Support

How CRaC Works with Ktor

Implementation

Running It

Back of Napkin Benchmark Results

When to Use CRaC