Skip to content
Michał Artur Marciniak
Go back

JVM Memory Fundamentals: Stack, Heap, and Object Headers

JVM Memory Fundamentals: Stack, Heap, and Object Headers


Every Java developer knows objects live on the heap and local variables on the stack. But when you create a simple Point object with two integers, do you know exactly how much memory it consumes? If you guessed “8 bytes for the two ints,” you’re off by 200%.

Your microservice crashes with “only” a million objects. GC pauses freeze traffic at the worst moment. Java 25 promises up to 30% CPU savings and ~22% heap reduction with one flag, but only if you understand what it’s doing. JVM memory fundamentals connect all three.

This is Part 1 of our JVM Memory series. We’ll explore the fundamental concepts every Java developer should know. Part 2 dives into the bit-level details for those who want to understand the JVM as deeply as the CPU understands it.


Part I: The Two Realms - Stack and Heap

The City Beneath Your Code

Imagine your Java application as a bustling metropolis. The Heap is the residential district-sprawling, shared by all citizens, constantly being cleaned by sanitation trucks (the Garbage Collector). The Stack is the business district-each building serves exactly one purpose, is demolished the moment its work is done, and is private to its owner.

Every thread in your application gets its own Stack-vertical, efficient, self-contained. The Heap is the shared commons where all threads store their objects, references crossing between threads like citizens visiting shared facilities.

The Stack: Where Methods Live and Die

The Stack is perhaps the most elegant data structure in computing. Last In, First Out. When you call a method, a stack frame is pushed. When the method returns, it’s popped. No garbage collection, no memory leaks - just automatic, deterministic cleanup.

Each thread gets its own stack, sized at JVM startup (-Xss flag, default 1MB). When a thread exceeds this limit? StackOverflowError. You’ve seen it - infinite recursion, overly deep call chains.

public class StackOverflowDemo {
    public static void main(String[] args) {
        recursiveMethod(1);
    }

    private static void recursiveMethod(int depth) {
        System.out.println("Depth: " + depth);
        recursiveMethod(depth + 1);  // Eventually: StackOverflowError
    }
}

The stack stores primitives and references.

The heap stores everything those references point to.

Simple distinction, massive implications for memory and GC.

The Heap: The Object Repository

If the Stack is where methods execute, the Heap is where objects reside. Every time you write new Object(), the JVM allocates space on the heap. Unlike the stack, heap memory isn’t automatically reclaimed when a method returns - that’s the Garbage Collector’s job.

The heap is shared across all threads. Your main thread creates an object; your worker thread can reference it. This sharing is powerful but requires synchronization when mutable objects are accessed concurrently.

Memory Error Types:


Part II: Stack Frames - The Method’s Workspace

Anatomy of a Frame

Every method invocation creates a stack frame - a self-contained execution environment. The frame contains everything the method needs to execute: its local variables, its computation workspace, and information about where to return when done.

The key insight: The compiler (javac) knows exactly how much memory each frame needs before the program even runs.

When javac compiles your source code, it analyzes every method to determine:

These values are stored in the .class file’s Code attribute. The JVM reads them when loading the class, so when a method is called, it knows precisely how much memory to allocate - no guesswork, no waste.

The Local Variable Array

The local variable array is indexed storage for everything local to the method:

Important: long and double occupy two consecutive slots. Everything else (int, Object reference, byte, short, char, boolean, float) occupies one slot.

The compiler is smart about reusing slots. If you have:

public void example() {
    int x = 1;  // Uses slot 1
    // ... code using x ...

    int y = 2;  // Can reuse slot 1 if x is no longer needed
}

javac analyzes variable lifetimes and reuses slots when possible, minimizing max_locals.

The Operand Stack

Bytecode works indirectly. Want to add two numbers? Load them onto the operand stack first. Execute iadd. Store the result back. Local variables provide storage; the operand stack provides the workspace where every instruction executes.

Consider this simple addition:

// Java source:
public int add(int a, int b) {
    return a + b;
}

// Bytecode (simplified):
// 0: iload_1    // Push local variable 1 (a) onto stack
// 1: iload_2    // Push local variable 2 (b) onto stack
// 2: iadd       // Pop two ints, add, push result
// 3: ireturn    // Pop result, return to caller

The operand stack starts empty. iload_1 pushes a. iload_2 pushes b. Now the stack is [a, b] (b on top). iadd pops both, adds them, pushes the result. Stack is now [result]. ireturn pops and returns it.

The compiler simulates this execution during compilation to find the maximum stack depth - the max_stack value stored in the .class file.

Dynamic Linking and Exception Handling

Every frame contains a reference to the runtime constant pool of the current class. When your code calls another method (obj.doSomething()), the bytecode contains a symbolic reference. The JVM resolves this to the actual method address - dynamic linking. Once resolved, the JVM caches the direct reference for subsequent calls.

The exception table maps ranges of bytecode (try blocks) to exception handlers (catch blocks). Each entry contains:

When an exception is thrown, the JVM searches the current frame’s exception table. If no match, it pops the frame and checks the caller’s table - unwinding the stack until a handler is found or the thread terminates.

Key Takeaway: Every method call creates a frame with dedicated space for data and computation. The compiler determines exact memory requirements (max_locals and max_stack) ahead of time, so the JVM allocates precisely what’s needed.


Part III: Object Headers - The Hidden Cost

The Header Tax

Create a simple object:

class Point {
    int x;
    int y;
}

Expected: 8 bytes
Traditional JVM: 24 bytes

Every object carries 12 bytes of header metadata:

FieldSizeContents
Mark Word8 bytesHash, lock state, GC age
Class Pointer4 bytesReference to class metadata

Additionally, alignment padding (4 bytes) may be inserted before the first field to ensure 8-byte alignment. The combined header plus padding costs 2× the actual data for small objects.

Java 25 changes the math. By embedding a 22-bit class index into unused Mark Word bits, the JVM compresses the header from 12 bytes to 8 bytes:

LayoutHeader SizePoint ObjectSavings
Traditional12 bytes24 bytes-
Compact (Java 25)8 bytes16 bytes33%

For arrays, the length field (4 bytes) forces 8-byte alignment regardless, so headers stay at 16 bytes. int[10] remains 56 bytes in both cases - compact headers mainly benefit small objects without array overhead.

Enable with: -XX:+UseCompactObjectHeaders

Seeing It In Action

Let’s use Java Object Layout (JOL) to inspect actual object sizes:

import org.openjdk.jol.info.ClassLayout;
import org.openjdk.jol.vm.VM;

public class ObjectHeaderDemo {
    public static void main(String[] args) {
        System.out.println(VM.current().details());

        // Simple object with two int fields
        class Point {
            int x;
            int y;
        }

        Point p = new Point();
        System.out.println("=== Point Object Layout ===");
        System.out.println(ClassLayout.parseInstance(p).toPrintable());

        // Array header with length field
        int[] intArray = new int[10];
        System.out.println("\n=== int[10] Array Layout ===");
        System.out.println(ClassLayout.parseInstance(intArray).toPrintable());

        // Demonstrate hash code storage
        Object o = new Object();
        System.out.println("\n=== Before hashCode() ===");
        System.out.println(ClassLayout.parseInstance(o).toPrintable());

        int hash = o.hashCode();
        System.out.println("\n=== After hashCode(): " + hash + " ===");
        System.out.println(ClassLayout.parseInstance(o).toPrintable());
    }
}

Expected output analysis:

The Mark Word: A Multi-Purpose Chameleon

The 64-bit Mark Word is remarkably versatile. It stores:

The same 64 bits serve different purposes depending on the object’s state. When you call hashCode(), bits that were unused suddenly store your hash. When you synchronized(obj), the bits transform to hold lock information. During GC, they become forwarding pointers.

Key Takeaway: Object headers (12 bytes) plus alignment padding create 16 bytes of overhead minimum - often consuming 50-67% of memory for small objects. In applications creating millions of small objects (microservices, caches, JSON parsing), this overhead compounds into gigabytes of wasted RAM.


Part IV: Garbage Collection - The Rhythm of Object Lifetimes

Most Objects Die Young

Garbage collection in the JVM is built on a simple observation:

The vast majority of objects live brief, intense lives.

Your request handlers, your temporary calculation intermediates, your StringBuilder instances - they sparkle into existence and disappear within milliseconds.

This is the Weak Generational Hypothesis, and it shapes every modern JVM garbage collector.

The Heap is divided into generations:

Young Generation Collections (Minor GC)

When Eden fills up, the JVM triggers a Minor GC:

  1. Stop-the-world pause (brief - typically milliseconds)
  2. Identify live objects: Starting from GC roots (stack references, static fields), traverse the object graph
  3. Copy survivors: Live objects in Eden are copied to Survivor space (S0 or S1)
  4. Empty Eden: Dead objects are simply abandoned - no sweeping needed
  5. Increment age: Objects’ GC age incremented in their Mark Word

The copying collector is efficient because most objects are already dead. If 90% of Eden is garbage, we only copy 10% - much faster than sweeping everything.

The Survivor Spaces Dance

The two Survivor spaces (S0 and S1) alternate roles:

Objects that survive enough Minor GCs (default threshold: 15, stored in 4 Mark Word bits) are tenured - promoted to the Old Generation.

Old Generation Collections (Major/Full GC)

When does Major GC trigger?

Unlike the predictable Eden-based Minor GC, Major GC starts when:

  1. Old Generation fills up - An allocation request in Old Gen cannot be satisfied
  2. Promotion fails - Survivor objects cannot be promoted to Old Gen due to insufficient space
  3. Explicit request - System.gc() (though modern JVMs may ignore this)
  4. Metaspace pressure - Class metadata exhaustion triggers Full GC to unload classes

Major GC is far less predictable than Minor GC. While Eden fills at a steady rate based on allocation velocity, Old Gen consumption depends on object lifetime patterns that vary by workload.

The Old Generation uses different algorithms because:

Different collectors handle Old Gen differently:

GC Roots: Where Collection Starts

Garbage collection doesn’t start from random objects - it starts from GC Roots, objects that are definitely alive:

  1. Stack references: Local variables and parameters on all thread stacks
  2. Static fields: Class static variables
  3. JNI references: Native code references
  4. Monitors: Objects used as synchronization monitors

The GC traverses from these roots, following references, marking everything reachable as live. Unmarked objects? Garbage.

Key Takeaway: GC leverages object lifetime patterns - most objects die young, so optimize for that with fast, copying Minor GCs. Long-lived objects graduate to Old Generation, where more sophisticated (but slower) collectors manage them.


Part V: The Evolution of Garbage Collection

From Seconds to Milliseconds

The JVM’s garbage collection story follows a clear trajectory: pause times have dropped from seconds to milliseconds over three decades.

1996-1998: The Foundation

Java 1.0 launched with simple Mark-and-Sweep - basic automatic memory management, but pause times grew linearly with heap size. Java 1.2 introduced Serial and Parallel collectors and established the generational hypothesis: most objects die young, so optimize for that with separate young and old generations. This insight remains fundamental to every modern collector.

2004-2014: Reducing Pauses

CMS (Java 5-7) was the first mostly-concurrent collector, running most work alongside the application. It reduced pause times but introduced fragmentation since it didn’t compact the heap. G1 (Java 7-9) brought region-based collection with explicit pause time targets, marking the shift toward predictable performance rather than just throughput.

2018-2025: The Low-Latency Era

ZGC and Shenandoah (Java 11-17) achieved sub-millisecond pauses through concurrent relocation, using colored pointers and Brooks pointers respectively. These collectors handle heaps up to 16TB without breaking the millisecond barrier. Java 21 made ZGC generational by default, and Java 25 brings the same optimization to Shenandoah - combining the efficiency of generational collection with concurrent low-latency operation.

The trend is unmistakable: GC pauses that once measured in seconds now complete in single-digit milliseconds.

Collector Selection Guide

Quick Reference:

CollectorBest ForPause TimeHeap SizeNotes
SerialSingle-core, embeddedHigh<100MBDefault for single-core
ParallelBatch processing, throughputMedium1-8GBThroughput focused
G1General purpose (Java 9+ default)Medium (20-200ms)4-16GBBalanced
ZGCLarge heaps, ultra-low latencyLow (<10ms)8GB-16TBGenerational by default (Java 24+)
ShenandoahContainers, low latencyLow (<10ms)8GB-16TBLower overhead than ZGC

For detailed tuning guidance and JVM flags, see our companion article: JVM Garbage Collectors: The Orchestra of Memory Management.


Part VI: Java 25 - The Latest Chapter

Compact Object Headers (JEP 519)

Java 25 introduces Compact Object Headers as a production-ready feature (experimental in Java 24). The innovation is elegant: compress the 12-byte header down to 8 bytes.

How it works: The JVM overlays the class pointer onto unused bits in the Mark Word. Traditional headers have a separate Mark Word (64 bits) and Class Pointer (32 bits). Compact headers embed a 22-bit class index into the Mark Word itself.

Why 22 bits? Because 2²² = 4,194,304 - enough to address over 4 million unique classes. Even the largest applications rarely exceed this.

Project Lilliput: This feature represents the first deliverable from Project Lilliput, OpenJDK’s initiative to minimize Java object memory overhead. The project roadmap targets 4-byte headers in a future release - halving the size once again. For now, the 12→8 byte reduction is just the beginning.

The Math:

ComponentTraditionalCompact (Java 25)
Mark Word64 bits64 bits (with embedded class)
Class Pointer32 bits0 bits (embedded)
Total96 bits (12 bytes)64 bits (8 bytes)

33% reduction (12→8 bytes). Up to 30% CPU savings in production.

Note that the percentage savings vary by object size: a bare Object with no fields shrinks from 16 bytes (12 header + 4 padding) to 8 bytes - a 50% reduction. Objects with fields see proportionally less dramatic savings as the data payload dominates.

Real-World Impact

Amazon battle-tested Compact Object Headers across hundreds of production microservices:

MetricImprovement
Heap usage22% reduction
CPU usageUp to 30% reduction
GC cycles15% fewer collections
JSON parsing10% faster

The gains are most dramatic for applications with many small objects:

Applications with few, large objects (video processing, image manipulation) see less benefit - the header was already negligible compared to data payload.

Enabling Compact Headers

Just add one JVM flag:

# Java 24 (experimental  -  unlock needed)
java -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders -jar myapp.jar

# Java 25 (production-ready)
java -XX:+UseCompactObjectHeaders -jar myapp.jar

# Verify it's working
java -XX:+UseCompactObjectHeaders -XX:+PrintFlagsFinal 2>&1 | grep UseCompactObjectHeaders

That’s it. No code changes. No recompilation. No regression testing.

Key Takeaway: Java 25 delivers 33% reduction in object header overhead with a single flag. It’s the most memory-efficient Java release ever - without changing a line of application code.


Part VII: Virtual Threads and Memory — A New Allocation Paradigm

The Million-Thread Challenge

Java 21 introduced Virtual Threads (JEP 444), fundamentally changing how Java applications handle concurrency. Where platform threads map 1:1 to operating system threads (typically limited to thousands), virtual threads are lightweight user-mode threads that the JVM can multiplex onto a small pool of platform threads — enabling millions of concurrent tasks.

But this concurrency revolution comes with a memory management trade-off: virtual thread stack frames (continuations) are heap-allocated when the thread is parked or suspended. When millions of virtual threads are blocked waiting for I/O, their stack state lives on the heap rather than in native memory. This creates a fundamentally different allocation pattern than traditional thread pools.

How Virtual Threads Change Memory Allocation

The Traditional Model:

The Virtual Thread Model:

Real-World Findings

Research from InfoQ’s case study (2024) reveals:

  1. Memory footprint varies dramatically based on workload patterns. Applications with high virtual thread churn may see increased GC pressure compared to equivalent platform thread implementations due to heap-resident continuations.

  2. CPU-intensive workloads may show lower throughput with virtual threads than well-tuned platform thread pools due to increased allocation overhead from managing millions of thread states.

  3. I/O-bound workloads benefit most — the blocking operations that make virtual threads shine don’t increase heap pressure proportionally since threads are parked during I/O waits, and the heap storage of continuations is more memory-efficient than native thread stacks.

Tuning for Virtual Threads

# Monitor virtual thread pinning (when a VT cannot be unmounted)
java -Djdk.tracePinnedThreads=short ...

# Adjust for higher allocation rates from virtual thread stacks
java -XX:+UseZGC -XX:MaxGCPauseMillis=5 -Xms4g -Xmx4g MyApp

# Consider larger young generation for VT-heavy workloads with G1
java -XX:+UseG1GC -XX:NewRatio=1 -XX:MaxGCPauseMillis=50 ...

Key Insight: When migrating to virtual threads, re-benchmark your memory allocation and GC behavior. The allocation profile may shift enough to warrant reconsidering your collector choice or tuning parameters. The collector that worked well for your platform-threaded application may not be optimal for the same workload using virtual threads.

Reference: Chirumamilla, P. et al. “Java Virtual Threads: a Case Study.” InfoQ, July 2024. The study found that “memory footprint in Open Liberty deployments can vary greatly based on factors like application design, workload level, and garbage collection behavior, so the reduced footprint of virtual threads may not result in an overall reduction in memory used.”

Going deeper: Part 2 covers Virtual Thread internals including continuation frame capture, mount/unmount mechanics, pinning detection, and detailed heap allocation analysis.


Cross-References and Further Reading

Continue Your Journey

Part 2: Deep Dive - For those who want to understand the bit-level details:

Related Articles:


Summary

By the end of Part 1, you should understand:

  1. The Two Realms: Stack (per-thread, automatic, LIFO) vs Heap (shared, GC-managed)
  2. Stack Frames: max_locals and max_stack determined at compile time, frame components
  3. Object Headers: 12 bytes (Mark Word + Class Pointer), with alignment padding applied before fields
  4. Generational GC: Most objects die young, Minor GC (fast) vs Major GC (rare)
  5. GC Evolution: From simple STW to sophisticated concurrent collectors
  6. Java 25: Compact Object Headers reduce overhead from 12 to 8 bytes (33% reduction)
  7. Virtual Threads: Java 21’s lightweight threads store continuations on heap, changing allocation patterns

Once you internalize these fundamentals, you start seeing hidden costs everywhere. A cache with a million entries pays 12MB in header overhead before storing a single byte of data. A microservice processing thousands of requests per second allocates more bytes for headers than for the actual request payload.

This overhead isn’t inevitable. Java 25 cuts it by a third with a single flag - no code changes, no recompilation, no testing cycles. The JVM spent thirty years optimizing garbage collection down to millisecond pauses. Now it’s optimizing the objects themselves.

Add -XX:+UseCompactObjectHeaders and measure the difference.


References

  1. Java Virtual Machine Specification

  2. JEP 519: Compact Object Headers

  3. Java Object Headers Deep Dive

  4. Java 25 Compact Headers Integration

  5. Understanding Object Headers

  6. GC Evolution from Java 1.0 to 21

  7. Java Object Layout (JOL)

  8. JVM Internal Architecture


Part 1 of the JVM Memory series. Part 2 covers bit-level internals, bytecode analysis, and concurrent GC algorithms.
Related: JVM: The Silent Revolution | JVM Garbage Collectors Guide


Share this post on:

Previous Post
The Art of Graceful Failure
Next Post
JVM Garbage Collection Guide: Collectors, Tuning, and Selection