You can find the full code for this blog post on GitHub.
The Cold Start Problem
JVM applications have a cold start problem. In resource-constrained environments like Google Cloud Run, startup times often stretch to 1-5+ seconds depending on the framework and application size. For traditional long-running servers, this doesn’t matter much. For serverless environments where instances spin up and down constantly, and requests are ofte blocked by cold start times, it can have a massive impact on user experience.
GraalVM Native Image is one solution. It compiles your app to a native binary with near-instant startup. But the tradeoffs are significant: compile times can stretch to several minutes, debugging issues is painful, and you need buy-in from library authors for reflection configuration that we still don’t fully have across the ecosystem. Even after all that, native images typically achieve only ~80% of peak JVM throughput due to the loss of JIT optimizations. For many applications, that tradeoff isn’t worth it.
CRaC offers a different approach: keep the full JVM, but skip the startup.
What is CRaC?
CRaC (Coordinated Restore at Checkpoint) is an OpenJDK project that lets you snapshot a running JVM and restore it later. The idea is simple:
- Start your application normally
- Let it warm up (JIT compile hot paths, initialize connections, load caches)
- Take a checkpoint—a snapshot of the entire JVM state
- Later, restore from that checkpoint in milliseconds
Under the hood, CRaC uses CRIU (Checkpoint/Restore In Userspace), a Linux kernel feature that can freeze and restore processes. When you restore, you’re not starting a new JVM—you’re resuming one that was already running.
The key benefit: your restored application has all the JIT-compiled code from the original run. There’s no warmup period. The JVM is already hot.
JVM Support
CRaC isn’t available in standard OpenJDK builds. You need a distribution that includes it:
| JVM | Supported Versions |
|---|---|
| Azul Zulu | 17, 21, 22, 23 |
| BellSoft Liberica | 17, 21 |
I recommend Azul Zulu—it was the first to offer commercial CRaC support and has the most mature implementation.
Important: CRaC requires a real Linux machine. On macOS and Windows, you can run in “simulation mode” for development—the checkpoint/restore lifecycle executes, but no actual snapshot is created. Docker on Mac won’t help here either; CRIU needs direct access to Linux kernel features that aren’t available through Docker’s virtualization layer. For real checkpoints, you need native Linux.
How CRaC Works with Ktor
The challenge with checkpointing a server is that open network sockets can’t be serialized. CRIU will fail if your application has any open file descriptors or sockets at checkpoint time.
CRaC solves this with the org.crac.Resource interface. Your application registers resources that
need to be notified before checkpoint and after restore:
class ServerCracResource(
private val server: EmbeddedServer<*, *>,
) : Resource {
override fun beforeCheckpoint(context: Context<out Resource>?) {
// Close all sockets before checkpoint
server.stop(gracePeriodMillis = 0, timeoutMillis = 0)
}
override fun afterRestore(context: Context<out Resource>?) {
// Restart the server after restore
server.start(wait = false)
}
companion object {
fun register(server: EmbeddedServer<*, *>) {
Core.getGlobalContext().register(ServerCracResource(server))
}
}
}
Before checkpoint, we stop the Netty server (closing all sockets). After restore, we restart it. The JVM state—including all your application’s initialized objects, caches, and JIT-compiled code—survives the checkpoint.
Implementation
Here’s a minimal Ktor application with CRaC support:
fun main(args: Array<String>) {
val shouldCheckpoint = args.contains("--checkpoint")
val server = embeddedServer(Netty, port = 8080) {
routing {
get("/") { call.respondText("Hello, CRaC!") }
get("/health") { call.respondText("OK") }
}
}
ServerCracResource.register(server)
server.start(wait = false)
if (shouldCheckpoint) {
// Trigger checkpoint programmatically
Core.checkpointRestore()
}
// Keep the application running
runBlocking { awaitCancellation() }
}
The --checkpoint flag tells the application to checkpoint itself after starting. When
Core.checkpointRestore() is called, the JVM snapshots itself and exits. The next time you restore
from that checkpoint, execution continues right after the checkpointRestore() call.
Running It
CRaC requires specific JVM flags:
Create a checkpoint:
java -XX:CRaCCheckpointTo=./checkpoint -jar app.jar --checkpoint
This starts the app, waits for it to be ready, takes a checkpoint to ./checkpoint/, and exits.
Restore from checkpoint:
java -XX:CRaCRestoreFrom=./checkpoint -jar app.jar
This resumes the JVM from the checkpoint. The application is serving requests within milliseconds.
Back of Napkin Benchmark Results
For a simple Ktor + Netty application:
| Scenario | Startup Time | Improvement |
|---|---|---|
| Normal startup | 352ms | baseline |
| Restore from checkpoint | 26ms | 14x faster |
The improvement scales with application complexity. Larger applications with more dependencies see even more dramatic improvements.
When to Use CRaC
CRaC is ideal for serverless and FaaS environments where cold starts directly impact user experience. That said, there are tradeoffs to consider: CRaC requires native Linux (Docker on Mac/Windows won’t work since CRIU needs real kernel access), checkpoint files can be large (hundreds of MB), and most importantly, developers need to be aware of what server “resources” can and cannot survive process checkpoint. Although, from experience, in most cases, once you get it setup you don’t have to think about it too much.
You can find the full code for this blog post on GitHub.